[2024-01-30 15:36:25,854] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-30 15:36:25,911] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-30 15:36:26,501] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-30 15:36:26,822] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-30 15:36:27,731] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-30 15:36:28,188] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-30 15:36:28,195] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-30 15:36:28,204] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm [2024-01-30 15:37:47,056] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,056] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:47,057] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,057] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:47,057] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,057] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:47,057] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,057] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,057] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:47,057] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:47,057] [INFO] [comm.py:643:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2024-01-30 15:37:47,089] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,089] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:47,131] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,132] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:47,145] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-01-30 15:37:47,145] [INFO] [comm.py:616:init_distributed] cdb=None [2024-01-30 15:37:53,504] [INFO] [partition_parameters.py:326:__exit__] finished initializing model with 13.02B parameters [2024-01-30 15:39:01,873] [WARNING] [partition_parameters.py:921:_post_init_method] param `class_embedding` in InternVisionEmbeddings not on GPU so was not broadcasted from rank 0 [2024-01-30 15:39:01,874] [WARNING] [partition_parameters.py:921:_post_init_method] param `position_embedding` in InternVisionEmbeddings not on GPU so was not broadcasted from rank 0 [2024-01-30 15:39:02,238] [INFO] [partition_parameters.py:326:__exit__] finished initializing model with 18.92B parameters use LN for projection: False use LN for projection: False use LN for projection: False use LN for projection: False use LN for projection: False use LN for projection: False use LN for projection: False use LN for projection: False Loading mm_projector weights... Loading mm_projector weights... Loading mm_projector weights...Loading mm_projector weights... Loading mm_projector weights... Loading mm_projector weights... Loading mm_projector weights... Loading mm_projector weights... Formatting inputs...Skip in lazy mode Parameter Offload: Total persistent parameters: 2274560 in 517 params {'loss': 0.9821, 'learning_rate': 1.282051282051282e-07, 'epoch': 0.0} {'loss': 1.2832, 'learning_rate': 2.564102564102564e-07, 'epoch': 0.0} [2024-01-30 15:42:46,561] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 1.3125, 'learning_rate': 3.846153846153847e-07, 'epoch': 0.0} {'loss': 1.2832, 'learning_rate': 5.128205128205128e-07, 'epoch': 0.0} {'loss': 1.3486, 'learning_rate': 6.41025641025641e-07, 'epoch': 0.0} {'loss': 1.3193, 'learning_rate': 7.692307692307694e-07, 'epoch': 0.0} {'loss': 1.3369, 'learning_rate': 8.974358974358975e-07, 'epoch': 0.0} {'loss': 1.3057, 'learning_rate': 1.0256410256410257e-06, 'epoch': 0.0} {'loss': 1.2705, 'learning_rate': 1.153846153846154e-06, 'epoch': 0.0} {'loss': 1.3086, 'learning_rate': 1.282051282051282e-06, 'epoch': 0.0} {'loss': 1.2393, 'learning_rate': 1.4102564102564104e-06, 'epoch': 0.0} {'loss': 1.2734, 'learning_rate': 1.5384615384615387e-06, 'epoch': 0.0} {'loss': 1.2012, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.0} {'loss': 1.2051, 'learning_rate': 1.794871794871795e-06, 'epoch': 0.0} {'loss': 1.1504, 'learning_rate': 1.9230769230769234e-06, 'epoch': 0.0} {'loss': 1.1572, 'learning_rate': 2.0512820512820513e-06, 'epoch': 0.0} {'loss': 1.1875, 'learning_rate': 2.1794871794871797e-06, 'epoch': 0.0} {'loss': 1.0679, 'learning_rate': 2.307692307692308e-06, 'epoch': 0.0} {'loss': 1.0625, 'learning_rate': 2.435897435897436e-06, 'epoch': 0.0} {'loss': 1.1309, 'learning_rate': 2.564102564102564e-06, 'epoch': 0.0} {'loss': 1.019, 'learning_rate': 2.6923076923076923e-06, 'epoch': 0.0} {'loss': 1.1182, 'learning_rate': 2.8205128205128207e-06, 'epoch': 0.0} {'loss': 1.0581, 'learning_rate': 2.948717948717949e-06, 'epoch': 0.0} {'loss': 0.9941, 'learning_rate': 3.0769230769230774e-06, 'epoch': 0.0} {'loss': 1.0869, 'learning_rate': 3.205128205128206e-06, 'epoch': 0.0} {'loss': 1.0483, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.01} {'loss': 1.0308, 'learning_rate': 3.4615384615384617e-06, 'epoch': 0.01} {'loss': 1.0425, 'learning_rate': 3.58974358974359e-06, 'epoch': 0.01} {'loss': 1.0352, 'learning_rate': 3.7179487179487184e-06, 'epoch': 0.01} [2024-01-30 15:51:14,488] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9985, 'learning_rate': 3.846153846153847e-06, 'epoch': 0.01} {'loss': 0.9956, 'learning_rate': 3.974358974358974e-06, 'epoch': 0.01} [2024-01-30 15:51:49,368] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 1.0405, 'learning_rate': 4.102564102564103e-06, 'epoch': 0.01} {'loss': 0.9629, 'learning_rate': 4.230769230769231e-06, 'epoch': 0.01} {'loss': 0.9907, 'learning_rate': 4.358974358974359e-06, 'epoch': 0.01} {'loss': 0.9653, 'learning_rate': 4.487179487179488e-06, 'epoch': 0.01} {'loss': 0.9526, 'learning_rate': 4.615384615384616e-06, 'epoch': 0.01} {'loss': 0.9785, 'learning_rate': 4.743589743589744e-06, 'epoch': 0.01} {'loss': 0.2355, 'learning_rate': 4.871794871794872e-06, 'epoch': 0.01} [2024-01-30 15:54:09,842] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 1.001, 'learning_rate': 5e-06, 'epoch': 0.01} [2024-01-30 15:54:27,829] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 1.0, 'learning_rate': 5.128205128205128e-06, 'epoch': 0.01} {'loss': 0.9692, 'learning_rate': 5.256410256410257e-06, 'epoch': 0.01} {'loss': 0.9619, 'learning_rate': 5.384615384615385e-06, 'epoch': 0.01} {'loss': 1.0107, 'learning_rate': 5.512820512820514e-06, 'epoch': 0.01} {'loss': 0.9077, 'learning_rate': 5.641025641025641e-06, 'epoch': 0.01} {'loss': 0.9839, 'learning_rate': 5.769230769230769e-06, 'epoch': 0.01} {'loss': 0.9917, 'learning_rate': 5.897435897435898e-06, 'epoch': 0.01} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/899330266.jpg' {'loss': 0.9424, 'learning_rate': 6.025641025641026e-06, 'epoch': 0.01} {'loss': 0.9214, 'learning_rate': 6.153846153846155e-06, 'epoch': 0.01} {'loss': 0.938, 'learning_rate': 6.282051282051282e-06, 'epoch': 0.01} {'loss': 0.9258, 'learning_rate': 6.410256410256412e-06, 'epoch': 0.01} {'loss': 0.9194, 'learning_rate': 6.538461538461539e-06, 'epoch': 0.01} {'loss': 0.9609, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.01} {'loss': 0.9624, 'learning_rate': 6.794871794871796e-06, 'epoch': 0.01} {'loss': 0.9346, 'learning_rate': 6.923076923076923e-06, 'epoch': 0.01} {'loss': 0.9243, 'learning_rate': 7.051282051282053e-06, 'epoch': 0.01} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/2061514022.jpg' {'loss': 0.9551, 'learning_rate': 7.17948717948718e-06, 'epoch': 0.01} {'loss': 0.2278, 'learning_rate': 7.307692307692308e-06, 'epoch': 0.01} {'loss': 0.9268, 'learning_rate': 7.435897435897437e-06, 'epoch': 0.01} {'loss': 0.9648, 'learning_rate': 7.564102564102564e-06, 'epoch': 0.01} {'loss': 0.9731, 'learning_rate': 7.692307692307694e-06, 'epoch': 0.01} {'loss': 0.9546, 'learning_rate': 7.820512820512822e-06, 'epoch': 0.01} {'loss': 0.9092, 'learning_rate': 7.948717948717949e-06, 'epoch': 0.01} {'loss': 0.9258, 'learning_rate': 8.076923076923077e-06, 'epoch': 0.01} {'loss': 0.8784, 'learning_rate': 8.205128205128205e-06, 'epoch': 0.01} {'loss': 0.9434, 'learning_rate': 8.333333333333334e-06, 'epoch': 0.01} {'loss': 0.9404, 'learning_rate': 8.461538461538462e-06, 'epoch': 0.01} {'loss': 0.917, 'learning_rate': 8.58974358974359e-06, 'epoch': 0.01} {'loss': 0.8423, 'learning_rate': 8.717948717948719e-06, 'epoch': 0.01} {'loss': 0.9238, 'learning_rate': 8.846153846153847e-06, 'epoch': 0.01} {'loss': 0.9214, 'learning_rate': 8.974358974358976e-06, 'epoch': 0.01} {'loss': 0.9111, 'learning_rate': 9.102564102564104e-06, 'epoch': 0.01} {'loss': 0.8931, 'learning_rate': 9.230769230769232e-06, 'epoch': 0.01} {'loss': 0.8706, 'learning_rate': 9.358974358974359e-06, 'epoch': 0.01} {'loss': 0.9414, 'learning_rate': 9.487179487179487e-06, 'epoch': 0.01} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/60170921.jpg' {'loss': 0.8916, 'learning_rate': 9.615384615384616e-06, 'epoch': 0.01} {'loss': 0.8857, 'learning_rate': 9.743589743589744e-06, 'epoch': 0.01} {'loss': 0.8779, 'learning_rate': 9.871794871794872e-06, 'epoch': 0.01} {'loss': 0.8638, 'learning_rate': 1e-05, 'epoch': 0.02} {'loss': 0.9282, 'learning_rate': 1.012820512820513e-05, 'epoch': 0.02} {'loss': 0.9673, 'learning_rate': 1.0256410256410256e-05, 'epoch': 0.02} {'loss': 0.9067, 'learning_rate': 1.0384615384615386e-05, 'epoch': 0.02} {'loss': 0.9531, 'learning_rate': 1.0512820512820514e-05, 'epoch': 0.02} {'loss': 0.9766, 'learning_rate': 1.0641025641025643e-05, 'epoch': 0.02} {'loss': 0.9829, 'learning_rate': 1.076923076923077e-05, 'epoch': 0.02} {'loss': 0.9136, 'learning_rate': 1.0897435897435898e-05, 'epoch': 0.02} {'loss': 0.9033, 'learning_rate': 1.1025641025641028e-05, 'epoch': 0.02} {'loss': 0.8901, 'learning_rate': 1.1153846153846154e-05, 'epoch': 0.02} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/823929493.jpg' {'loss': 0.9429, 'learning_rate': 1.1282051282051283e-05, 'epoch': 0.02} {'loss': 0.9199, 'learning_rate': 1.1410256410256411e-05, 'epoch': 0.02} {'loss': 0.2101, 'learning_rate': 1.1538461538461538e-05, 'epoch': 0.02} {'loss': 0.8726, 'learning_rate': 1.1666666666666668e-05, 'epoch': 0.02} {'loss': 0.9263, 'learning_rate': 1.1794871794871796e-05, 'epoch': 0.02} {'loss': 0.9346, 'learning_rate': 1.1923076923076925e-05, 'epoch': 0.02} {'loss': 0.9365, 'learning_rate': 1.2051282051282051e-05, 'epoch': 0.02} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/395539331.jpg' {'loss': 0.8945, 'learning_rate': 1.217948717948718e-05, 'epoch': 0.02} {'loss': 0.2422, 'learning_rate': 1.230769230769231e-05, 'epoch': 0.02} {'loss': 0.9053, 'learning_rate': 1.2435897435897436e-05, 'epoch': 0.02} {'loss': 0.9146, 'learning_rate': 1.2564102564102565e-05, 'epoch': 0.02} {'loss': 0.9277, 'learning_rate': 1.2692307692307693e-05, 'epoch': 0.02} {'loss': 0.8384, 'learning_rate': 1.2820512820512823e-05, 'epoch': 0.02} {'loss': 0.2266, 'learning_rate': 1.294871794871795e-05, 'epoch': 0.02} {'loss': 0.8911, 'learning_rate': 1.3076923076923078e-05, 'epoch': 0.02} {'loss': 0.9995, 'learning_rate': 1.3205128205128207e-05, 'epoch': 0.02} {'loss': 0.9219, 'learning_rate': 1.3333333333333333e-05, 'epoch': 0.02} [2024-01-30 16:14:26,237] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9155, 'learning_rate': 1.3461538461538463e-05, 'epoch': 0.02} [2024-01-30 16:14:44,868] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.1984, 'learning_rate': 1.3589743589743592e-05, 'epoch': 0.02} [2024-01-30 16:15:03,248] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8926, 'learning_rate': 1.3717948717948718e-05, 'epoch': 0.02} {'loss': 0.9688, 'learning_rate': 1.3846153846153847e-05, 'epoch': 0.02} [2024-01-30 16:15:38,513] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.895, 'learning_rate': 1.3974358974358975e-05, 'epoch': 0.02} {'loss': 0.9292, 'learning_rate': 1.4102564102564105e-05, 'epoch': 0.02} [2024-01-30 16:16:14,474] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.918, 'learning_rate': 1.4230769230769232e-05, 'epoch': 0.02} {'loss': 0.8926, 'learning_rate': 1.435897435897436e-05, 'epoch': 0.02} {'loss': 0.9824, 'learning_rate': 1.4487179487179489e-05, 'epoch': 0.02} [2024-01-30 16:17:08,880] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8838, 'learning_rate': 1.4615384615384615e-05, 'epoch': 0.02} [2024-01-30 16:17:26,493] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9121, 'learning_rate': 1.4743589743589745e-05, 'epoch': 0.02} {'loss': 0.9141, 'learning_rate': 1.4871794871794874e-05, 'epoch': 0.02} {'loss': 0.2192, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.02} {'loss': 0.9307, 'learning_rate': 1.5128205128205129e-05, 'epoch': 0.02} {'loss': 0.9038, 'learning_rate': 1.5256410256410257e-05, 'epoch': 0.02} {'loss': 0.9946, 'learning_rate': 1.5384615384615387e-05, 'epoch': 0.02} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/789401509.jpg' {'loss': 0.8877, 'learning_rate': 1.5512820512820516e-05, 'epoch': 0.02} {'loss': 0.9526, 'learning_rate': 1.5641025641025644e-05, 'epoch': 0.02} {'loss': 0.9678, 'learning_rate': 1.576923076923077e-05, 'epoch': 0.02} {'loss': 0.8691, 'learning_rate': 1.5897435897435897e-05, 'epoch': 0.02} {'loss': 0.8975, 'learning_rate': 1.602564102564103e-05, 'epoch': 0.02} {'loss': 0.9131, 'learning_rate': 1.6153846153846154e-05, 'epoch': 0.02} {'loss': 0.9087, 'learning_rate': 1.6282051282051282e-05, 'epoch': 0.02} {'loss': 0.958, 'learning_rate': 1.641025641025641e-05, 'epoch': 0.02} {'loss': 0.813, 'learning_rate': 1.653846153846154e-05, 'epoch': 0.02} {'loss': 0.8921, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.03} {'loss': 0.9072, 'learning_rate': 1.6794871794871796e-05, 'epoch': 0.03} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/931580390.jpg' {'loss': 0.9219, 'learning_rate': 1.6923076923076924e-05, 'epoch': 0.03} {'loss': 0.8682, 'learning_rate': 1.7051282051282053e-05, 'epoch': 0.03} {'loss': 0.873, 'learning_rate': 1.717948717948718e-05, 'epoch': 0.03} {'loss': 0.9014, 'learning_rate': 1.730769230769231e-05, 'epoch': 0.03} {'loss': 0.8887, 'learning_rate': 1.7435897435897438e-05, 'epoch': 0.03} [2024-01-30 16:24:20,940] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8511, 'learning_rate': 1.7564102564102566e-05, 'epoch': 0.03} [2024-01-30 16:24:39,213] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9463, 'learning_rate': 1.7692307692307694e-05, 'epoch': 0.03} {'loss': 0.9619, 'learning_rate': 1.7820512820512823e-05, 'epoch': 0.03} {'loss': 0.8784, 'learning_rate': 1.794871794871795e-05, 'epoch': 0.03} {'loss': 0.9482, 'learning_rate': 1.807692307692308e-05, 'epoch': 0.03} {'loss': 0.9204, 'learning_rate': 1.8205128205128208e-05, 'epoch': 0.03} {'loss': 0.9243, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.03} {'loss': 0.9507, 'learning_rate': 1.8461538461538465e-05, 'epoch': 0.03} {'loss': 0.9233, 'learning_rate': 1.8589743589743593e-05, 'epoch': 0.03} {'loss': 0.915, 'learning_rate': 1.8717948717948718e-05, 'epoch': 0.03} {'loss': 0.918, 'learning_rate': 1.8846153846153846e-05, 'epoch': 0.03} {'loss': 0.8765, 'learning_rate': 1.8974358974358975e-05, 'epoch': 0.03} {'loss': 0.8789, 'learning_rate': 1.9102564102564106e-05, 'epoch': 0.03} {'loss': 0.8945, 'learning_rate': 1.923076923076923e-05, 'epoch': 0.03} {'loss': 0.8862, 'learning_rate': 1.935897435897436e-05, 'epoch': 0.03} {'loss': 0.9321, 'learning_rate': 1.9487179487179488e-05, 'epoch': 0.03} {'loss': 0.2299, 'learning_rate': 1.9615384615384617e-05, 'epoch': 0.03} {'loss': 0.9346, 'learning_rate': 1.9743589743589745e-05, 'epoch': 0.03} {'loss': 0.855, 'learning_rate': 1.9871794871794873e-05, 'epoch': 0.03} {'loss': 0.8594, 'learning_rate': 2e-05, 'epoch': 0.03} {'loss': 0.9663, 'learning_rate': 1.9999998058827844e-05, 'epoch': 0.03} {'loss': 0.9131, 'learning_rate': 1.9999992235312136e-05, 'epoch': 0.03} {'loss': 0.9399, 'learning_rate': 1.9999982529455127e-05, 'epoch': 0.03} {'loss': 0.8906, 'learning_rate': 1.9999968941260596e-05, 'epoch': 0.03} {'loss': 0.8921, 'learning_rate': 1.9999951470733808e-05, 'epoch': 0.03} {'loss': 0.916, 'learning_rate': 1.9999930117881548e-05, 'epoch': 0.03} [2024-01-30 16:32:20,619] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9365, 'learning_rate': 1.9999904882712115e-05, 'epoch': 0.03} [2024-01-30 16:32:40,306] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9272, 'learning_rate': 1.99998757652353e-05, 'epoch': 0.03} [2024-01-30 16:32:59,251] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2394, 'learning_rate': 1.9999842765462403e-05, 'epoch': 0.03} {'loss': 0.9136, 'learning_rate': 1.999980588340624e-05, 'epoch': 0.03} {'loss': 0.916, 'learning_rate': 1.9999765119081132e-05, 'epoch': 0.03} {'loss': 0.9355, 'learning_rate': 1.9999720472502902e-05, 'epoch': 0.03} {'loss': 0.8936, 'learning_rate': 1.9999671943688885e-05, 'epoch': 0.03} {'loss': 0.9146, 'learning_rate': 1.9999619532657915e-05, 'epoch': 0.03} {'loss': 0.8794, 'learning_rate': 1.9999563239430352e-05, 'epoch': 0.03} {'loss': 0.7959, 'learning_rate': 1.9999503064028043e-05, 'epoch': 0.03} {'loss': 0.9355, 'learning_rate': 1.999943900647435e-05, 'epoch': 0.03} {'loss': 0.9009, 'learning_rate': 1.9999371066794146e-05, 'epoch': 0.03} {'loss': 0.9263, 'learning_rate': 1.9999299245013805e-05, 'epoch': 0.03} {'loss': 0.9214, 'learning_rate': 1.999922354116121e-05, 'epoch': 0.03} {'loss': 0.8823, 'learning_rate': 1.999914395526575e-05, 'epoch': 0.03} {'loss': 0.8804, 'learning_rate': 1.9999060487358333e-05, 'epoch': 0.03} {'loss': 0.8491, 'learning_rate': 1.9998973137471352e-05, 'epoch': 0.03} {'loss': 0.9204, 'learning_rate': 1.9998881905638727e-05, 'epoch': 0.03} {'loss': 0.2222, 'learning_rate': 1.9998786791895874e-05, 'epoch': 0.03} {'loss': 0.8945, 'learning_rate': 1.999868779627972e-05, 'epoch': 0.04} {'loss': 0.9609, 'learning_rate': 1.9998584918828695e-05, 'epoch': 0.04} {'loss': 0.8853, 'learning_rate': 1.9998478159582747e-05, 'epoch': 0.04} {'loss': 0.894, 'learning_rate': 1.999836751858332e-05, 'epoch': 0.04} {'loss': 0.9521, 'learning_rate': 1.9998252995873367e-05, 'epoch': 0.04} {'loss': 0.9292, 'learning_rate': 1.999813459149735e-05, 'epoch': 0.04} {'loss': 0.2572, 'learning_rate': 1.9998012305501243e-05, 'epoch': 0.04} {'loss': 0.8833, 'learning_rate': 1.999788613793251e-05, 'epoch': 0.04} {'loss': 0.8232, 'learning_rate': 1.999775608884015e-05, 'epoch': 0.04} {'loss': 0.2317, 'learning_rate': 1.9997622158274635e-05, 'epoch': 0.04} {'loss': 0.8892, 'learning_rate': 1.9997484346287973e-05, 'epoch': 0.04} {'loss': 0.8916, 'learning_rate': 1.9997342652933668e-05, 'epoch': 0.04} {'loss': 0.9834, 'learning_rate': 1.9997197078266723e-05, 'epoch': 0.04} {'loss': 0.9209, 'learning_rate': 1.999704762234366e-05, 'epoch': 0.04} {'loss': 0.8774, 'learning_rate': 1.99968942852225e-05, 'epoch': 0.04} {'loss': 0.9336, 'learning_rate': 1.9996737066962778e-05, 'epoch': 0.04} {'loss': 0.9458, 'learning_rate': 1.9996575967625525e-05, 'epoch': 0.04} {'loss': 0.9302, 'learning_rate': 1.999641098727329e-05, 'epoch': 0.04} {'loss': 0.8755, 'learning_rate': 1.999624212597013e-05, 'epoch': 0.04} {'loss': 0.8994, 'learning_rate': 1.9996069383781587e-05, 'epoch': 0.04} {'loss': 0.9521, 'learning_rate': 1.9995892760774738e-05, 'epoch': 0.04} {'loss': 0.9517, 'learning_rate': 1.9995712257018153e-05, 'epoch': 0.04} {'loss': 0.9326, 'learning_rate': 1.9995527872581903e-05, 'epoch': 0.04} {'loss': 0.8623, 'learning_rate': 1.9995339607537578e-05, 'epoch': 0.04} {'loss': 0.245, 'learning_rate': 1.9995147461958267e-05, 'epoch': 0.04} {'loss': 0.8838, 'learning_rate': 1.999495143591857e-05, 'epoch': 0.04} {'loss': 0.8975, 'learning_rate': 1.999475152949459e-05, 'epoch': 0.04} {'loss': 0.8872, 'learning_rate': 1.9994547742763935e-05, 'epoch': 0.04} {'loss': 0.9365, 'learning_rate': 1.9994340075805724e-05, 'epoch': 0.04} {'loss': 0.9316, 'learning_rate': 1.9994128528700583e-05, 'epoch': 0.04} {'loss': 0.915, 'learning_rate': 1.9993913101530635e-05, 'epoch': 0.04} {'loss': 0.8413, 'learning_rate': 1.9993693794379525e-05, 'epoch': 0.04} {'loss': 0.8862, 'learning_rate': 1.9993470607332387e-05, 'epoch': 0.04} {'loss': 0.9004, 'learning_rate': 1.999324354047588e-05, 'epoch': 0.04} {'loss': 0.9121, 'learning_rate': 1.9993012593898146e-05, 'epoch': 0.04} {'loss': 0.9341, 'learning_rate': 1.9992777767688857e-05, 'epoch': 0.04} {'loss': 0.1997, 'learning_rate': 1.9992539061939175e-05, 'epoch': 0.04} {'loss': 0.2354, 'learning_rate': 1.999229647674178e-05, 'epoch': 0.04} {'loss': 0.9629, 'learning_rate': 1.9992050012190845e-05, 'epoch': 0.04} {'loss': 0.2268, 'learning_rate': 1.9991799668382058e-05, 'epoch': 0.04} {'loss': 0.9067, 'learning_rate': 1.9991545445412614e-05, 'epoch': 0.04} {'loss': 0.9048, 'learning_rate': 1.9991287343381208e-05, 'epoch': 0.04} [2024-01-30 16:51:19,554] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9727, 'learning_rate': 1.9991025362388044e-05, 'epoch': 0.04} [2024-01-30 16:51:36,386] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9395, 'learning_rate': 1.9990759502534835e-05, 'epoch': 0.04} {'loss': 0.8892, 'learning_rate': 1.9990489763924796e-05, 'epoch': 0.04} {'loss': 0.2173, 'learning_rate': 1.9990216146662648e-05, 'epoch': 0.04} {'loss': 0.8672, 'learning_rate': 1.9989938650854618e-05, 'epoch': 0.04} {'loss': 0.9351, 'learning_rate': 1.998965727660844e-05, 'epoch': 0.04} {'loss': 0.8745, 'learning_rate': 1.9989372024033352e-05, 'epoch': 0.04} {'loss': 0.876, 'learning_rate': 1.99890828932401e-05, 'epoch': 0.04} {'loss': 0.8745, 'learning_rate': 1.9988789884340938e-05, 'epoch': 0.04} {'loss': 0.8853, 'learning_rate': 1.9988492997449615e-05, 'epoch': 0.04} {'loss': 0.8921, 'learning_rate': 1.9988192232681398e-05, 'epoch': 0.05} {'loss': 0.8853, 'learning_rate': 1.9987887590153055e-05, 'epoch': 0.05} {'loss': 0.8872, 'learning_rate': 1.9987579069982856e-05, 'epoch': 0.05} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/570035651.jpg' {'loss': 0.9448, 'learning_rate': 1.9987266672290577e-05, 'epoch': 0.05} {'loss': 0.9258, 'learning_rate': 1.9986950397197503e-05, 'epoch': 0.05} {'loss': 0.9336, 'learning_rate': 1.9986630244826425e-05, 'epoch': 0.05} {'loss': 0.9033, 'learning_rate': 1.998630621530164e-05, 'epoch': 0.05} {'loss': 0.9463, 'learning_rate': 1.998597830874894e-05, 'epoch': 0.05} {'loss': 0.9136, 'learning_rate': 1.9985646525295634e-05, 'epoch': 0.05} {'loss': 0.8252, 'learning_rate': 1.998531086507053e-05, 'epoch': 0.05} {'loss': 0.8628, 'learning_rate': 1.9984971328203945e-05, 'epoch': 0.05} {'loss': 0.8857, 'learning_rate': 1.9984627914827698e-05, 'epoch': 0.05} {'loss': 0.9019, 'learning_rate': 1.9984280625075115e-05, 'epoch': 0.05} {'loss': 0.9546, 'learning_rate': 1.9983929459081022e-05, 'epoch': 0.05} {'loss': 0.8931, 'learning_rate': 1.998357441698176e-05, 'epoch': 0.05} {'loss': 0.8999, 'learning_rate': 1.998321549891516e-05, 'epoch': 0.05} {'loss': 0.8672, 'learning_rate': 1.9982852705020572e-05, 'epoch': 0.05} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/953735702.jpg' {'loss': 0.854, 'learning_rate': 1.9982486035438848e-05, 'epoch': 0.05} {'loss': 0.2339, 'learning_rate': 1.9982115490312334e-05, 'epoch': 0.05} {'loss': 0.8945, 'learning_rate': 1.9981741069784894e-05, 'epoch': 0.05} {'loss': 0.9141, 'learning_rate': 1.9981362774001886e-05, 'epoch': 0.05} {'loss': 0.8896, 'learning_rate': 1.9980980603110185e-05, 'epoch': 0.05} {'loss': 0.9209, 'learning_rate': 1.9980594557258158e-05, 'epoch': 0.05} {'loss': 0.8687, 'learning_rate': 1.9980204636595682e-05, 'epoch': 0.05} {'loss': 0.8174, 'learning_rate': 1.9979810841274135e-05, 'epoch': 0.05} {'loss': 0.9097, 'learning_rate': 1.9979413171446403e-05, 'epoch': 0.05} {'loss': 0.9287, 'learning_rate': 1.9979011627266884e-05, 'epoch': 0.05} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1878239589.jpg' {'loss': 0.8965, 'learning_rate': 1.997860620889146e-05, 'epoch': 0.05} {'loss': 0.9087, 'learning_rate': 1.997819691647753e-05, 'epoch': 0.05} {'loss': 0.7778, 'learning_rate': 1.9977783750184e-05, 'epoch': 0.05} {'loss': 0.916, 'learning_rate': 1.9977366710171274e-05, 'epoch': 0.05} {'loss': 0.8979, 'learning_rate': 1.9976945796601258e-05, 'epoch': 0.05} {'loss': 0.9048, 'learning_rate': 1.9976521009637366e-05, 'epoch': 0.05} {'loss': 0.897, 'learning_rate': 1.997609234944452e-05, 'epoch': 0.05} {'loss': 0.8989, 'learning_rate': 1.9975659816189137e-05, 'epoch': 0.05} {'loss': 0.9727, 'learning_rate': 1.997522341003914e-05, 'epoch': 0.05} [2024-01-30 17:05:17,640] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8257, 'learning_rate': 1.9974783131163957e-05, 'epoch': 0.05} [2024-01-30 17:05:35,370] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8862, 'learning_rate': 1.9974338979734523e-05, 'epoch': 0.05} {'loss': 0.9043, 'learning_rate': 1.997389095592327e-05, 'epoch': 0.05} {'loss': 0.9287, 'learning_rate': 1.9973439059904133e-05, 'epoch': 0.05} {'loss': 0.916, 'learning_rate': 1.9972983291852565e-05, 'epoch': 0.05} {'loss': 0.9092, 'learning_rate': 1.9972523651945496e-05, 'epoch': 0.05} {'loss': 0.9219, 'learning_rate': 1.9972060140361384e-05, 'epoch': 0.05} {'loss': 0.8882, 'learning_rate': 1.997159275728018e-05, 'epoch': 0.05} {'loss': 0.9292, 'learning_rate': 1.9971121502883332e-05, 'epoch': 0.05} {'loss': 0.915, 'learning_rate': 1.9970646377353802e-05, 'epoch': 0.05} {'loss': 0.9253, 'learning_rate': 1.997016738087605e-05, 'epoch': 0.05} {'loss': 0.9131, 'learning_rate': 1.9969684513636035e-05, 'epoch': 0.05} {'loss': 0.8921, 'learning_rate': 1.9969197775821227e-05, 'epoch': 0.05} {'loss': 0.8916, 'learning_rate': 1.9968707167620593e-05, 'epoch': 0.05} {'loss': 0.8745, 'learning_rate': 1.9968212689224603e-05, 'epoch': 0.05} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/965776611.jpg' {'loss': 0.9341, 'learning_rate': 1.996771434082523e-05, 'epoch': 0.05} {'loss': 0.8281, 'learning_rate': 1.9967212122615958e-05, 'epoch': 0.06} {'loss': 0.8882, 'learning_rate': 1.9966706034791752e-05, 'epoch': 0.06} {'loss': 0.8062, 'learning_rate': 1.9966196077549106e-05, 'epoch': 0.06} {'loss': 0.853, 'learning_rate': 1.996568225108599e-05, 'epoch': 0.06} {'loss': 0.9062, 'learning_rate': 1.99651645556019e-05, 'epoch': 0.06} {'loss': 0.8872, 'learning_rate': 1.9964642991297817e-05, 'epoch': 0.06} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/812931432.jpg' {'loss': 0.917, 'learning_rate': 1.996411755837623e-05, 'epoch': 0.06} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1878239376.jpg' {'loss': 0.873, 'learning_rate': 1.9963588257041137e-05, 'epoch': 0.06} {'loss': 0.8823, 'learning_rate': 1.996305508749802e-05, 'epoch': 0.06} {'loss': 0.9238, 'learning_rate': 1.9962518049953887e-05, 'epoch': 0.06} {'loss': 0.8662, 'learning_rate': 1.9961977144617225e-05, 'epoch': 0.06} {'loss': 0.9175, 'learning_rate': 1.996143237169803e-05, 'epoch': 0.06} {'loss': 0.2467, 'learning_rate': 1.996088373140781e-05, 'epoch': 0.06} {'loss': 0.8921, 'learning_rate': 1.9960331223959564e-05, 'epoch': 0.06} {'loss': 0.8965, 'learning_rate': 1.995977484956779e-05, 'epoch': 0.06} {'loss': 0.8652, 'learning_rate': 1.9959214608448495e-05, 'epoch': 0.06} {'loss': 0.9185, 'learning_rate': 1.9958650500819183e-05, 'epoch': 0.06} {'loss': 0.9443, 'learning_rate': 1.995808252689886e-05, 'epoch': 0.06} {'loss': 0.8652, 'learning_rate': 1.9957510686908034e-05, 'epoch': 0.06} {'loss': 0.8457, 'learning_rate': 1.9956934981068713e-05, 'epoch': 0.06} {'loss': 0.8984, 'learning_rate': 1.9956355409604402e-05, 'epoch': 0.06} {'loss': 0.8477, 'learning_rate': 1.9955771972740118e-05, 'epoch': 0.06} {'loss': 0.9297, 'learning_rate': 1.9955184670702363e-05, 'epoch': 0.06} {'loss': 0.8594, 'learning_rate': 1.995459350371915e-05, 'epoch': 0.06} {'loss': 0.8784, 'learning_rate': 1.9953998472019996e-05, 'epoch': 0.06} {'loss': 0.8979, 'learning_rate': 1.995339957583591e-05, 'epoch': 0.06} {'loss': 0.8828, 'learning_rate': 1.9952796815399403e-05, 'epoch': 0.06} {'loss': 0.8379, 'learning_rate': 1.9952190190944484e-05, 'epoch': 0.06} {'loss': 0.9189, 'learning_rate': 1.9951579702706668e-05, 'epoch': 0.06} {'loss': 0.9404, 'learning_rate': 1.9950965350922975e-05, 'epoch': 0.06} {'loss': 0.8335, 'learning_rate': 1.9950347135831907e-05, 'epoch': 0.06} {'loss': 0.8945, 'learning_rate': 1.994972505767348e-05, 'epoch': 0.06} {'loss': 0.9126, 'learning_rate': 1.994909911668921e-05, 'epoch': 0.06} {'loss': 0.896, 'learning_rate': 1.99484693131221e-05, 'epoch': 0.06} {'loss': 0.8906, 'learning_rate': 1.994783564721667e-05, 'epoch': 0.06} {'loss': 0.875, 'learning_rate': 1.9947198119218924e-05, 'epoch': 0.06} {'loss': 0.8579, 'learning_rate': 1.994655672937638e-05, 'epoch': 0.06} {'loss': 0.2104, 'learning_rate': 1.9945911477938044e-05, 'epoch': 0.06} {'loss': 0.9756, 'learning_rate': 1.994526236515442e-05, 'epoch': 0.06} {'loss': 0.9126, 'learning_rate': 1.994460939127753e-05, 'epoch': 0.06} {'loss': 0.9512, 'learning_rate': 1.9943952556560863e-05, 'epoch': 0.06} {'loss': 0.8335, 'learning_rate': 1.9943291861259433e-05, 'epoch': 0.06} {'loss': 0.9331, 'learning_rate': 1.9942627305629747e-05, 'epoch': 0.06} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/883880075.jpg' {'loss': 0.8794, 'learning_rate': 1.9941958889929808e-05, 'epoch': 0.06} {'loss': 0.8765, 'learning_rate': 1.9941286614419113e-05, 'epoch': 0.06} {'loss': 0.8472, 'learning_rate': 1.994061047935867e-05, 'epoch': 0.06} {'loss': 0.9082, 'learning_rate': 1.9939930485010968e-05, 'epoch': 0.06} {'loss': 0.8911, 'learning_rate': 1.9939246631640014e-05, 'epoch': 0.06} {'loss': 0.8833, 'learning_rate': 1.99385589195113e-05, 'epoch': 0.06} {'loss': 0.8262, 'learning_rate': 1.9937867348891815e-05, 'epoch': 0.06} {'loss': 0.9028, 'learning_rate': 1.9937171920050057e-05, 'epoch': 0.06} {'loss': 0.8979, 'learning_rate': 1.9936472633256012e-05, 'epoch': 0.06} {'loss': 0.8804, 'learning_rate': 1.9935769488781167e-05, 'epoch': 0.07} {'loss': 0.896, 'learning_rate': 1.993506248689851e-05, 'epoch': 0.07} {'loss': 0.9189, 'learning_rate': 1.993435162788252e-05, 'epoch': 0.07} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/412076217.jpg' {'loss': 0.8945, 'learning_rate': 1.993363691200918e-05, 'epoch': 0.07} {'loss': 0.9136, 'learning_rate': 1.9932918339555965e-05, 'epoch': 0.07} {'loss': 0.835, 'learning_rate': 1.9932195910801848e-05, 'epoch': 0.07} {'loss': 0.9385, 'learning_rate': 1.9931469626027305e-05, 'epoch': 0.07} {'loss': 0.8687, 'learning_rate': 1.9930739485514304e-05, 'epoch': 0.07} {'loss': 0.8877, 'learning_rate': 1.9930005489546308e-05, 'epoch': 0.07} {'loss': 0.8906, 'learning_rate': 1.9929267638408277e-05, 'epoch': 0.07} {'loss': 0.9111, 'learning_rate': 1.9928525932386678e-05, 'epoch': 0.07} {'loss': 0.8857, 'learning_rate': 1.9927780371769463e-05, 'epoch': 0.07} {'loss': 0.9097, 'learning_rate': 1.9927030956846083e-05, 'epoch': 0.07} {'loss': 0.8472, 'learning_rate': 1.992627768790749e-05, 'epoch': 0.07} {'loss': 0.8809, 'learning_rate': 1.9925520565246125e-05, 'epoch': 0.07} {'loss': 0.2281, 'learning_rate': 1.9924759589155932e-05, 'epoch': 0.07} {'loss': 0.8579, 'learning_rate': 1.9923994759932344e-05, 'epoch': 0.07} {'loss': 0.9321, 'learning_rate': 1.9923226077872296e-05, 'epoch': 0.07} {'loss': 0.873, 'learning_rate': 1.9922453543274223e-05, 'epoch': 0.07} {'loss': 0.8647, 'learning_rate': 1.9921677156438044e-05, 'epoch': 0.07} {'loss': 0.876, 'learning_rate': 1.9920896917665178e-05, 'epoch': 0.07} {'loss': 0.9355, 'learning_rate': 1.992011282725854e-05, 'epoch': 0.07} {'loss': 0.9473, 'learning_rate': 1.9919324885522548e-05, 'epoch': 0.07} {'loss': 0.8931, 'learning_rate': 1.99185330927631e-05, 'epoch': 0.07} {'loss': 0.895, 'learning_rate': 1.99177374492876e-05, 'epoch': 0.07} {'loss': 0.8584, 'learning_rate': 1.991693795540494e-05, 'epoch': 0.07} {'loss': 0.8853, 'learning_rate': 1.9916134611425522e-05, 'epoch': 0.07} {'loss': 0.8638, 'learning_rate': 1.9915327417661226e-05, 'epoch': 0.07} {'loss': 0.8711, 'learning_rate': 1.991451637442543e-05, 'epoch': 0.07} {'loss': 0.9014, 'learning_rate': 1.9913701482033008e-05, 'epoch': 0.07} [2024-01-30 17:35:08,418] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9014, 'learning_rate': 1.9912882740800336e-05, 'epoch': 0.07} [2024-01-30 17:35:27,481] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8774, 'learning_rate': 1.9912060151045273e-05, 'epoch': 0.07} {'loss': 0.8887, 'learning_rate': 1.9911233713087172e-05, 'epoch': 0.07} [2024-01-30 17:36:08,639] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.916, 'learning_rate': 1.9910403427246895e-05, 'epoch': 0.07} [2024-01-30 17:36:28,227] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8857, 'learning_rate': 1.990956929384678e-05, 'epoch': 0.07} {'loss': 0.8877, 'learning_rate': 1.990873131321067e-05, 'epoch': 0.07} {'loss': 0.8716, 'learning_rate': 1.9907889485663897e-05, 'epoch': 0.07} {'loss': 0.9009, 'learning_rate': 1.9907043811533283e-05, 'epoch': 0.07} {'loss': 0.8638, 'learning_rate': 1.9906194291147155e-05, 'epoch': 0.07} {'loss': 0.8608, 'learning_rate': 1.9905340924835322e-05, 'epoch': 0.07} {'loss': 0.9248, 'learning_rate': 1.9904483712929094e-05, 'epoch': 0.07} {'loss': 0.8535, 'learning_rate': 1.9903622655761267e-05, 'epoch': 0.07} {'loss': 0.8823, 'learning_rate': 1.990275775366613e-05, 'epoch': 0.07} {'loss': 0.8643, 'learning_rate': 1.9901889006979473e-05, 'epoch': 0.07} {'loss': 0.895, 'learning_rate': 1.990101641603857e-05, 'epoch': 0.07} {'loss': 0.875, 'learning_rate': 1.9900139981182193e-05, 'epoch': 0.07} {'loss': 0.8809, 'learning_rate': 1.9899259702750604e-05, 'epoch': 0.07} {'loss': 0.8569, 'learning_rate': 1.9898375581085555e-05, 'epoch': 0.07} {'loss': 0.874, 'learning_rate': 1.9897487616530296e-05, 'epoch': 0.07} {'loss': 0.9023, 'learning_rate': 1.9896595809429565e-05, 'epoch': 0.07} {'loss': 0.8965, 'learning_rate': 1.9895700160129593e-05, 'epoch': 0.07} {'loss': 0.8677, 'learning_rate': 1.9894800668978095e-05, 'epoch': 0.07} {'loss': 0.897, 'learning_rate': 1.9893897336324292e-05, 'epoch': 0.08} {'loss': 0.8979, 'learning_rate': 1.9892990162518884e-05, 'epoch': 0.08} {'loss': 0.9053, 'learning_rate': 1.9892079147914072e-05, 'epoch': 0.08} {'loss': 0.9175, 'learning_rate': 1.9891164292863537e-05, 'epoch': 0.08} [2024-01-30 17:43:09,778] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8809, 'learning_rate': 1.9890245597722465e-05, 'epoch': 0.08} [2024-01-30 17:43:27,682] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9175, 'learning_rate': 1.9889323062847516e-05, 'epoch': 0.08} {'loss': 0.896, 'learning_rate': 1.988839668859686e-05, 'epoch': 0.08} {'loss': 0.9141, 'learning_rate': 1.988746647533014e-05, 'epoch': 0.08} {'loss': 0.8535, 'learning_rate': 1.9886532423408495e-05, 'epoch': 0.08} {'loss': 0.918, 'learning_rate': 1.9885594533194564e-05, 'epoch': 0.08} {'loss': 0.9087, 'learning_rate': 1.9884652805052465e-05, 'epoch': 0.08} {'loss': 0.8413, 'learning_rate': 1.9883707239347804e-05, 'epoch': 0.08} {'loss': 0.8613, 'learning_rate': 1.988275783644769e-05, 'epoch': 0.08} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/877936293.jpg' {'loss': 0.8745, 'learning_rate': 1.988180459672071e-05, 'epoch': 0.08} {'loss': 0.9185, 'learning_rate': 1.988084752053695e-05, 'epoch': 0.08} {'loss': 0.9033, 'learning_rate': 1.9879886608267967e-05, 'epoch': 0.08} {'loss': 0.9155, 'learning_rate': 1.9878921860286832e-05, 'epoch': 0.08} {'loss': 0.9204, 'learning_rate': 1.9877953276968088e-05, 'epoch': 0.08} {'loss': 0.8667, 'learning_rate': 1.9876980858687777e-05, 'epoch': 0.08} {'loss': 0.8662, 'learning_rate': 1.9876004605823417e-05, 'epoch': 0.08} {'loss': 0.2332, 'learning_rate': 1.987502451875403e-05, 'epoch': 0.08} {'loss': 0.2334, 'learning_rate': 1.987404059786012e-05, 'epoch': 0.08} {'loss': 0.8716, 'learning_rate': 1.9873052843523676e-05, 'epoch': 0.08} {'loss': 0.894, 'learning_rate': 1.987206125612818e-05, 'epoch': 0.08} {'loss': 0.8853, 'learning_rate': 1.98710658360586e-05, 'epoch': 0.08} {'loss': 0.896, 'learning_rate': 1.987006658370139e-05, 'epoch': 0.08} {'loss': 0.8618, 'learning_rate': 1.9869063499444495e-05, 'epoch': 0.08} {'loss': 0.9058, 'learning_rate': 1.9868056583677346e-05, 'epoch': 0.08} {'loss': 0.9194, 'learning_rate': 1.9867045836790867e-05, 'epoch': 0.08} {'loss': 0.8643, 'learning_rate': 1.9866031259177463e-05, 'epoch': 0.08} {'loss': 0.917, 'learning_rate': 1.9865012851231022e-05, 'epoch': 0.08} {'loss': 0.894, 'learning_rate': 1.9863990613346936e-05, 'epoch': 0.08} [2024-01-30 17:51:42,561] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9214, 'learning_rate': 1.986296454592206e-05, 'epoch': 0.08} [2024-01-30 17:52:01,933] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8901, 'learning_rate': 1.9861934649354763e-05, 'epoch': 0.08} {'loss': 0.9219, 'learning_rate': 1.9860900924044873e-05, 'epoch': 0.08} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1568811012.jpg' [2024-01-30 17:52:40,894] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8867, 'learning_rate': 1.9859863370393726e-05, 'epoch': 0.08} {'loss': 0.8027, 'learning_rate': 1.9858821988804132e-05, 'epoch': 0.08} {'loss': 0.9048, 'learning_rate': 1.9857776779680393e-05, 'epoch': 0.08} {'loss': 0.8555, 'learning_rate': 1.98567277434283e-05, 'epoch': 0.08} {'loss': 0.8916, 'learning_rate': 1.9855674880455115e-05, 'epoch': 0.08} {'loss': 0.875, 'learning_rate': 1.98546181911696e-05, 'epoch': 0.08} {'loss': 0.9346, 'learning_rate': 1.9853557675982e-05, 'epoch': 0.08} {'loss': 0.9165, 'learning_rate': 1.985249333530404e-05, 'epoch': 0.08} {'loss': 0.8481, 'learning_rate': 1.9851425169548938e-05, 'epoch': 0.08} {'loss': 0.833, 'learning_rate': 1.9850353179131392e-05, 'epoch': 0.08} [2024-01-30 17:55:52,878] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2903, 'learning_rate': 1.9849277364467585e-05, 'epoch': 0.08} {'loss': 0.2892, 'learning_rate': 1.984819772597518e-05, 'epoch': 0.08} {'loss': 0.8643, 'learning_rate': 1.9847114264073336e-05, 'epoch': 0.08} [2024-01-30 17:56:45,187] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8892, 'learning_rate': 1.984602697918269e-05, 'epoch': 0.08} {'loss': 0.8955, 'learning_rate': 1.9844935871725363e-05, 'epoch': 0.08} {'loss': 0.8789, 'learning_rate': 1.9843840942124956e-05, 'epoch': 0.08} {'loss': 0.8545, 'learning_rate': 1.9842742190806566e-05, 'epoch': 0.08} [2024-01-30 17:57:57,840] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9375, 'learning_rate': 1.984163961819676e-05, 'epoch': 0.09} {'loss': 0.8691, 'learning_rate': 1.9840533224723595e-05, 'epoch': 0.09} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1567615317.jpg' {'loss': 0.8818, 'learning_rate': 1.9839423010816616e-05, 'epoch': 0.09} {'loss': 0.8765, 'learning_rate': 1.983830897690684e-05, 'epoch': 0.09} {'loss': 0.8994, 'learning_rate': 1.9837191123426777e-05, 'epoch': 0.09} {'loss': 0.8994, 'learning_rate': 1.983606945081042e-05, 'epoch': 0.09} {'loss': 0.9067, 'learning_rate': 1.983494395949323e-05, 'epoch': 0.09} {'loss': 0.8936, 'learning_rate': 1.983381464991217e-05, 'epoch': 0.09} {'loss': 0.895, 'learning_rate': 1.9832681522505676e-05, 'epoch': 0.09} {'loss': 0.8276, 'learning_rate': 1.9831544577713663e-05, 'epoch': 0.09} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/201112973.jpg' {'loss': 0.9355, 'learning_rate': 1.983040381597754e-05, 'epoch': 0.09} {'loss': 0.8789, 'learning_rate': 1.982925923774018e-05, 'epoch': 0.09} {'loss': 0.8228, 'learning_rate': 1.9828110843445954e-05, 'epoch': 0.09} {'loss': 0.8599, 'learning_rate': 1.982695863354071e-05, 'epoch': 0.09} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/268016860.jpg' {'loss': 0.8691, 'learning_rate': 1.9825802608471767e-05, 'epoch': 0.09} {'loss': 0.8516, 'learning_rate': 1.982464276868794e-05, 'epoch': 0.09} {'loss': 0.918, 'learning_rate': 1.982347911463952e-05, 'epoch': 0.09} {'loss': 0.9087, 'learning_rate': 1.9822311646778277e-05, 'epoch': 0.09} {'loss': 0.8682, 'learning_rate': 1.982114036555746e-05, 'epoch': 0.09} {'loss': 0.853, 'learning_rate': 1.9819965271431797e-05, 'epoch': 0.09} {'loss': 0.8574, 'learning_rate': 1.9818786364857506e-05, 'epoch': 0.09} WARNING: tokenization mismatch: 1 vs. 70. (ignored) {'loss': 0.8779, 'learning_rate': 1.9817603646292278e-05, 'epoch': 0.09} {'loss': 0.8989, 'learning_rate': 1.9816417116195287e-05, 'epoch': 0.09} {'loss': 0.8594, 'learning_rate': 1.9815226775027182e-05, 'epoch': 0.09} [2024-01-30 18:05:24,788] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8877, 'learning_rate': 1.9814032623250093e-05, 'epoch': 0.09} {'loss': 0.9512, 'learning_rate': 1.9812834661327632e-05, 'epoch': 0.09} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/185343342X.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/968557902.jpg' {'loss': 0.8589, 'learning_rate': 1.9811632889724888e-05, 'epoch': 0.09} {'loss': 0.8369, 'learning_rate': 1.9810427308908437e-05, 'epoch': 0.09} {'loss': 0.8047, 'learning_rate': 1.9809217919346318e-05, 'epoch': 0.09} {'loss': 0.8096, 'learning_rate': 1.980800472150806e-05, 'epoch': 0.09} {'loss': 0.9492, 'learning_rate': 1.9806787715864674e-05, 'epoch': 0.09} {'loss': 0.8843, 'learning_rate': 1.9805566902888637e-05, 'epoch': 0.09} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/818405988.jpg' {'loss': 0.9189, 'learning_rate': 1.9804342283053916e-05, 'epoch': 0.09} {'loss': 0.9048, 'learning_rate': 1.980311385683594e-05, 'epoch': 0.09} {'loss': 0.9395, 'learning_rate': 1.980188162471164e-05, 'epoch': 0.09} {'loss': 0.8716, 'learning_rate': 1.98006455871594e-05, 'epoch': 0.09} {'loss': 0.8809, 'learning_rate': 1.97994057446591e-05, 'epoch': 0.09} {'loss': 0.835, 'learning_rate': 1.979816209769209e-05, 'epoch': 0.09} {'loss': 0.8677, 'learning_rate': 1.9796914646741187e-05, 'epoch': 0.09} {'loss': 0.8271, 'learning_rate': 1.9795663392290702e-05, 'epoch': 0.09} {'loss': 0.8745, 'learning_rate': 1.9794408334826415e-05, 'epoch': 0.09} {'loss': 0.8267, 'learning_rate': 1.979314947483558e-05, 'epoch': 0.09} {'loss': 0.8916, 'learning_rate': 1.9791886812806932e-05, 'epoch': 0.09} {'loss': 0.9375, 'learning_rate': 1.9790620349230676e-05, 'epoch': 0.09} {'loss': 0.8433, 'learning_rate': 1.9789350084598504e-05, 'epoch': 0.09} {'loss': 0.8789, 'learning_rate': 1.9788076019403565e-05, 'epoch': 0.09} {'loss': 0.8291, 'learning_rate': 1.9786798154140507e-05, 'epoch': 0.09} {'loss': 0.9229, 'learning_rate': 1.9785516489305437e-05, 'epoch': 0.09} {'loss': 0.3138, 'learning_rate': 1.9784231025395936e-05, 'epoch': 0.09} {'loss': 0.8965, 'learning_rate': 1.9782941762911075e-05, 'epoch': 0.09} {'loss': 0.8896, 'learning_rate': 1.9781648702351383e-05, 'epoch': 0.09} {'loss': 0.8892, 'learning_rate': 1.9780351844218874e-05, 'epoch': 0.09} {'loss': 0.9048, 'learning_rate': 1.977905118901703e-05, 'epoch': 0.1} {'loss': 0.8662, 'learning_rate': 1.977774673725081e-05, 'epoch': 0.1} {'loss': 0.8306, 'learning_rate': 1.977643848942665e-05, 'epoch': 0.1} {'loss': 0.2664, 'learning_rate': 1.977512644605246e-05, 'epoch': 0.1} {'loss': 0.8628, 'learning_rate': 1.9773810607637612e-05, 'epoch': 0.1} {'loss': 0.8511, 'learning_rate': 1.9772490974692962e-05, 'epoch': 0.1} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/877793417.jpg' {'loss': 0.8843, 'learning_rate': 1.9771167547730844e-05, 'epoch': 0.1} {'loss': 0.916, 'learning_rate': 1.976984032726505e-05, 'epoch': 0.1} {'loss': 0.9214, 'learning_rate': 1.976850931381086e-05, 'epoch': 0.1} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/471523771.jpg' {'loss': 0.9062, 'learning_rate': 1.976717450788501e-05, 'epoch': 0.1} {'loss': 0.8657, 'learning_rate': 1.9765835910005726e-05, 'epoch': 0.1} {'loss': 0.8433, 'learning_rate': 1.9764493520692685e-05, 'epoch': 0.1} {'loss': 0.8521, 'learning_rate': 1.9763147340467067e-05, 'epoch': 0.1} {'loss': 0.9556, 'learning_rate': 1.9761797369851498e-05, 'epoch': 0.1} {'loss': 0.835, 'learning_rate': 1.9760443609370074e-05, 'epoch': 0.1} {'loss': 0.8628, 'learning_rate': 1.975908605954838e-05, 'epoch': 0.1} {'loss': 0.8745, 'learning_rate': 1.9757724720913466e-05, 'epoch': 0.1} {'loss': 0.9077, 'learning_rate': 1.9756359593993845e-05, 'epoch': 0.1} {'loss': 0.8657, 'learning_rate': 1.975499067931951e-05, 'epoch': 0.1} {'loss': 0.2382, 'learning_rate': 1.975361797742192e-05, 'epoch': 0.1} {'loss': 0.9214, 'learning_rate': 1.9752241488834002e-05, 'epoch': 0.1} {'loss': 0.8945, 'learning_rate': 1.975086121409016e-05, 'epoch': 0.1} {'loss': 0.8833, 'learning_rate': 1.974947715372626e-05, 'epoch': 0.1} {'loss': 0.8354, 'learning_rate': 1.974808930827965e-05, 'epoch': 0.1} {'loss': 0.8291, 'learning_rate': 1.9746697678289128e-05, 'epoch': 0.1} {'loss': 0.853, 'learning_rate': 1.9745302264294982e-05, 'epoch': 0.1} {'loss': 0.8696, 'learning_rate': 1.9743903066838954e-05, 'epoch': 0.1} {'loss': 0.8716, 'learning_rate': 1.9742500086464266e-05, 'epoch': 0.1} {'loss': 0.896, 'learning_rate': 1.9741093323715597e-05, 'epoch': 0.1} {'loss': 0.2428, 'learning_rate': 1.9739682779139107e-05, 'epoch': 0.1} {'loss': 0.9004, 'learning_rate': 1.9738268453282414e-05, 'epoch': 0.1} {'loss': 0.8721, 'learning_rate': 1.9736850346694608e-05, 'epoch': 0.1} {'loss': 0.8882, 'learning_rate': 1.973542845992625e-05, 'epoch': 0.1} {'loss': 0.9087, 'learning_rate': 1.9734002793529362e-05, 'epoch': 0.1} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1575662698.jpg' {'loss': 0.8374, 'learning_rate': 1.9732573348057437e-05, 'epoch': 0.1} [2024-01-30 18:24:33,337] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.853, 'learning_rate': 1.973114012406544e-05, 'epoch': 0.1} {'loss': 0.9253, 'learning_rate': 1.9729703122109788e-05, 'epoch': 0.1} {'loss': 0.8711, 'learning_rate': 1.9728262342748384e-05, 'epoch': 0.1} {'loss': 0.8208, 'learning_rate': 1.9726817786540584e-05, 'epoch': 0.1} {'loss': 0.8867, 'learning_rate': 1.9725369454047215e-05, 'epoch': 0.1} {'loss': 0.8975, 'learning_rate': 1.9723917345830568e-05, 'epoch': 0.1} {'loss': 0.9175, 'learning_rate': 1.9722461462454405e-05, 'epoch': 0.1} {'loss': 0.8706, 'learning_rate': 1.9721001804483947e-05, 'epoch': 0.1} {'loss': 0.8401, 'learning_rate': 1.9719538372485887e-05, 'epoch': 0.1} {'loss': 0.9536, 'learning_rate': 1.9718071167028376e-05, 'epoch': 0.1} {'loss': 0.8457, 'learning_rate': 1.9716600188681038e-05, 'epoch': 0.1} {'loss': 0.8652, 'learning_rate': 1.971512543801495e-05, 'epoch': 0.1} {'loss': 0.8818, 'learning_rate': 1.9713646915602663e-05, 'epoch': 0.1} {'loss': 0.8955, 'learning_rate': 1.9712164622018197e-05, 'epoch': 0.1} {'loss': 0.8721, 'learning_rate': 1.9710678557837024e-05, 'epoch': 0.1} {'loss': 0.8379, 'learning_rate': 1.9709188723636088e-05, 'epoch': 0.1} {'loss': 0.8735, 'learning_rate': 1.970769511999379e-05, 'epoch': 0.1} {'loss': 0.8716, 'learning_rate': 1.9706197747490004e-05, 'epoch': 0.11} {'loss': 0.8564, 'learning_rate': 1.9704696606706055e-05, 'epoch': 0.11} {'loss': 0.8579, 'learning_rate': 1.9703191698224742e-05, 'epoch': 0.11} {'loss': 0.8955, 'learning_rate': 1.9701683022630323e-05, 'epoch': 0.11} {'loss': 0.9048, 'learning_rate': 1.9700170580508514e-05, 'epoch': 0.11} {'loss': 0.917, 'learning_rate': 1.9698654372446495e-05, 'epoch': 0.11} {'loss': 0.8882, 'learning_rate': 1.969713439903292e-05, 'epoch': 0.11} [2024-01-30 18:32:02,448] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8735, 'learning_rate': 1.9695610660857886e-05, 'epoch': 0.11} {'loss': 0.8594, 'learning_rate': 1.9694083158512965e-05, 'epoch': 0.11} {'loss': 0.9111, 'learning_rate': 1.9692551892591185e-05, 'epoch': 0.11} {'loss': 0.8999, 'learning_rate': 1.9691016863687037e-05, 'epoch': 0.11} {'loss': 0.8945, 'learning_rate': 1.968947807239647e-05, 'epoch': 0.11} {'loss': 0.8369, 'learning_rate': 1.9687935519316897e-05, 'epoch': 0.11} {'loss': 0.8516, 'learning_rate': 1.9686389205047186e-05, 'epoch': 0.11} {'loss': 0.8662, 'learning_rate': 1.9684839130187678e-05, 'epoch': 0.11} {'loss': 0.897, 'learning_rate': 1.968328529534016e-05, 'epoch': 0.11} {'loss': 0.8467, 'learning_rate': 1.9681727701107885e-05, 'epoch': 0.11} {'loss': 0.8506, 'learning_rate': 1.9680166348095568e-05, 'epoch': 0.11} {'loss': 0.8398, 'learning_rate': 1.967860123690937e-05, 'epoch': 0.11} {'loss': 0.834, 'learning_rate': 1.9677032368156934e-05, 'epoch': 0.11} {'loss': 0.8486, 'learning_rate': 1.967545974244734e-05, 'epoch': 0.11} {'loss': 0.8618, 'learning_rate': 1.9673883360391138e-05, 'epoch': 0.11} {'loss': 0.9053, 'learning_rate': 1.9672303222600333e-05, 'epoch': 0.11} {'loss': 0.8853, 'learning_rate': 1.967071932968839e-05, 'epoch': 0.11} {'loss': 0.9067, 'learning_rate': 1.9669131682270232e-05, 'epoch': 0.11} [2024-01-30 18:37:39,602] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8135, 'learning_rate': 1.9667540280962235e-05, 'epoch': 0.11} {'loss': 0.8135, 'learning_rate': 1.966594512638224e-05, 'epoch': 0.11} [2024-01-30 18:38:14,614] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8901, 'learning_rate': 1.9664346219149538e-05, 'epoch': 0.11} {'loss': 0.8652, 'learning_rate': 1.966274355988488e-05, 'epoch': 0.11} [2024-01-30 18:38:48,811] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8188, 'learning_rate': 1.9661137149210473e-05, 'epoch': 0.11} {'loss': 0.8823, 'learning_rate': 1.9659526987749987e-05, 'epoch': 0.11} [2024-01-30 18:39:27,576] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2593, 'learning_rate': 1.9657913076128532e-05, 'epoch': 0.11} [2024-01-30 18:39:46,488] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2805, 'learning_rate': 1.965629541497269e-05, 'epoch': 0.11} [2024-01-30 18:40:05,701] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8589, 'learning_rate': 1.9654674004910493e-05, 'epoch': 0.11} {'loss': 0.8672, 'learning_rate': 1.9653048846571427e-05, 'epoch': 0.11} [2024-01-30 18:40:45,747] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9023, 'learning_rate': 1.9651419940586437e-05, 'epoch': 0.11} {'loss': 0.853, 'learning_rate': 1.964978728758791e-05, 'epoch': 0.11} [2024-01-30 18:41:21,851] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8936, 'learning_rate': 1.9648150888209715e-05, 'epoch': 0.11} [2024-01-30 18:41:40,213] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9077, 'learning_rate': 1.9646510743087144e-05, 'epoch': 0.11} {'loss': 0.9067, 'learning_rate': 1.964486685285697e-05, 'epoch': 0.11} {'loss': 0.8535, 'learning_rate': 1.9643219218157395e-05, 'epoch': 0.11} {'loss': 0.8672, 'learning_rate': 1.9641567839628092e-05, 'epoch': 0.11} {'loss': 0.8721, 'learning_rate': 1.963991271791019e-05, 'epoch': 0.11} {'loss': 0.8745, 'learning_rate': 1.9638253853646255e-05, 'epoch': 0.11} {'loss': 0.834, 'learning_rate': 1.9636591247480323e-05, 'epoch': 0.11} {'loss': 0.8794, 'learning_rate': 1.9634924900057867e-05, 'epoch': 0.11} {'loss': 0.3042, 'learning_rate': 1.963325481202583e-05, 'epoch': 0.11} {'loss': 0.8701, 'learning_rate': 1.963158098403259e-05, 'epoch': 0.11} {'loss': 0.855, 'learning_rate': 1.9629903416727987e-05, 'epoch': 0.11} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/762704519.jpg' {'loss': 0.8291, 'learning_rate': 1.962822211076331e-05, 'epoch': 0.11} [2024-01-30 18:45:22,714] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.832, 'learning_rate': 1.96265370667913e-05, 'epoch': 0.11} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/312187114.jpg' {'loss': 0.8706, 'learning_rate': 1.9624848285466146e-05, 'epoch': 0.11} [2024-01-30 18:45:56,711] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8452, 'learning_rate': 1.9623155767443498e-05, 'epoch': 0.12} {'loss': 0.8569, 'learning_rate': 1.9621459513380445e-05, 'epoch': 0.12} [2024-01-30 18:46:32,071] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8589, 'learning_rate': 1.9619759523935532e-05, 'epoch': 0.12} [2024-01-30 18:46:51,306] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.9121, 'learning_rate': 1.9618055799768757e-05, 'epoch': 0.12} {'loss': 0.8022, 'learning_rate': 1.961634834154156e-05, 'epoch': 0.12} {'loss': 0.8604, 'learning_rate': 1.9614637149916834e-05, 'epoch': 0.12} {'loss': 0.8569, 'learning_rate': 1.9612922225558924e-05, 'epoch': 0.12} {'loss': 0.875, 'learning_rate': 1.961120356913363e-05, 'epoch': 0.12} {'loss': 0.855, 'learning_rate': 1.960948118130818e-05, 'epoch': 0.12} {'loss': 0.894, 'learning_rate': 1.9607755062751273e-05, 'epoch': 0.12} {'loss': 0.8633, 'learning_rate': 1.9606025214133046e-05, 'epoch': 0.12} {'loss': 0.2579, 'learning_rate': 1.9604291636125084e-05, 'epoch': 0.12} {'loss': 0.2312, 'learning_rate': 1.960255432940043e-05, 'epoch': 0.12} {'loss': 0.8145, 'learning_rate': 1.9600813294633552e-05, 'epoch': 0.12} {'loss': 0.8418, 'learning_rate': 1.9599068532500394e-05, 'epoch': 0.12} [2024-01-30 18:50:36,231] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8345, 'learning_rate': 1.9597320043678322e-05, 'epoch': 0.12} [2024-01-30 18:50:53,810] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8823, 'learning_rate': 1.9595567828846166e-05, 'epoch': 0.12} {'loss': 0.8545, 'learning_rate': 1.9593811888684192e-05, 'epoch': 0.12} {'loss': 0.8774, 'learning_rate': 1.9592052223874115e-05, 'epoch': 0.12} {'loss': 0.8303, 'learning_rate': 1.959028883509911e-05, 'epoch': 0.12} {'loss': 0.8599, 'learning_rate': 1.9588521723043764e-05, 'epoch': 0.12} {'loss': 0.8428, 'learning_rate': 1.958675088839415e-05, 'epoch': 0.12} {'loss': 0.8687, 'learning_rate': 1.9584976331837758e-05, 'epoch': 0.12} {'loss': 0.8599, 'learning_rate': 1.9583198054063535e-05, 'epoch': 0.12} {'loss': 0.8711, 'learning_rate': 1.9581416055761865e-05, 'epoch': 0.12} {'loss': 0.8594, 'learning_rate': 1.9579630337624585e-05, 'epoch': 0.12} {'loss': 0.9238, 'learning_rate': 1.9577840900344974e-05, 'epoch': 0.12} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/860208656.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/521506743.jpg' {'loss': 0.897, 'learning_rate': 1.9576047744617752e-05, 'epoch': 0.12} {'loss': 0.853, 'learning_rate': 1.957425087113908e-05, 'epoch': 0.12} {'loss': 0.8057, 'learning_rate': 1.9572450280606568e-05, 'epoch': 0.12} {'loss': 0.894, 'learning_rate': 1.9570645973719273e-05, 'epoch': 0.12} {'loss': 0.8516, 'learning_rate': 1.9568837951177677e-05, 'epoch': 0.12} {'loss': 0.8398, 'learning_rate': 1.9567026213683728e-05, 'epoch': 0.12} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/671766627.jpg' {'loss': 0.8682, 'learning_rate': 1.9565210761940798e-05, 'epoch': 0.12} {'loss': 0.8887, 'learning_rate': 1.956339159665371e-05, 'epoch': 0.12} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/933261004.jpg' {'loss': 0.9033, 'learning_rate': 1.956156871852873e-05, 'epoch': 0.12} {'loss': 0.8643, 'learning_rate': 1.9559742128273558e-05, 'epoch': 0.12} {'loss': 0.8823, 'learning_rate': 1.9557911826597337e-05, 'epoch': 0.12} {'loss': 0.8413, 'learning_rate': 1.9556077814210662e-05, 'epoch': 0.12} {'loss': 0.8574, 'learning_rate': 1.955424009182555e-05, 'epoch': 0.12} {'loss': 0.8403, 'learning_rate': 1.955239866015547e-05, 'epoch': 0.12} {'loss': 0.7993, 'learning_rate': 1.9550553519915335e-05, 'epoch': 0.12} {'loss': 0.8442, 'learning_rate': 1.954870467182149e-05, 'epoch': 0.12} {'loss': 0.8809, 'learning_rate': 1.954685211659172e-05, 'epoch': 0.12} {'loss': 0.8398, 'learning_rate': 1.9544995854945248e-05, 'epoch': 0.12} {'loss': 0.8823, 'learning_rate': 1.954313588760274e-05, 'epoch': 0.12} {'loss': 0.8608, 'learning_rate': 1.9541272215286304e-05, 'epoch': 0.12} {'loss': 0.8018, 'learning_rate': 1.9539404838719477e-05, 'epoch': 0.12} {'loss': 0.8486, 'learning_rate': 1.9537533758627242e-05, 'epoch': 0.12} {'loss': 0.8159, 'learning_rate': 1.953565897573601e-05, 'epoch': 0.12} {'loss': 0.8901, 'learning_rate': 1.9533780490773645e-05, 'epoch': 0.12} {'loss': 0.8169, 'learning_rate': 1.9531898304469435e-05, 'epoch': 0.12} [2024-01-30 19:01:58,205] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7988, 'learning_rate': 1.953001241755411e-05, 'epoch': 0.13} [2024-01-30 19:02:17,100] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8555, 'learning_rate': 1.952812283075984e-05, 'epoch': 0.13} {'loss': 0.8921, 'learning_rate': 1.952622954482022e-05, 'epoch': 0.13} {'loss': 0.8643, 'learning_rate': 1.9524332560470293e-05, 'epoch': 0.13} {'loss': 0.9224, 'learning_rate': 1.9522431878446536e-05, 'epoch': 0.13} {'loss': 0.8643, 'learning_rate': 1.9520527499486856e-05, 'epoch': 0.13} {'loss': 0.9307, 'learning_rate': 1.95186194243306e-05, 'epoch': 0.13} {'loss': 0.3027, 'learning_rate': 1.9516707653718546e-05, 'epoch': 0.13} {'loss': 0.8452, 'learning_rate': 1.9514792188392914e-05, 'epoch': 0.13} {'loss': 0.8042, 'learning_rate': 1.9512873029097347e-05, 'epoch': 0.13} {'loss': 0.8511, 'learning_rate': 1.9510950176576933e-05, 'epoch': 0.13} {'loss': 0.8398, 'learning_rate': 1.950902363157819e-05, 'epoch': 0.13} {'loss': 0.8628, 'learning_rate': 1.950709339484907e-05, 'epoch': 0.13} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/785807209.jpg' {'loss': 0.8306, 'learning_rate': 1.9505159467138954e-05, 'epoch': 0.13} {'loss': 0.8965, 'learning_rate': 1.9503221849198655e-05, 'epoch': 0.13} {'loss': 0.8267, 'learning_rate': 1.9501280541780435e-05, 'epoch': 0.13} {'loss': 0.8608, 'learning_rate': 1.9499335545637968e-05, 'epoch': 0.13} {'loss': 0.8657, 'learning_rate': 1.949738686152637e-05, 'epoch': 0.13} {'loss': 0.897, 'learning_rate': 1.9495434490202188e-05, 'epoch': 0.13} {'loss': 0.8291, 'learning_rate': 1.94934784324234e-05, 'epoch': 0.13} {'loss': 0.8872, 'learning_rate': 1.9491518688949417e-05, 'epoch': 0.13} {'loss': 0.8003, 'learning_rate': 1.9489555260541074e-05, 'epoch': 0.13} {'loss': 0.8291, 'learning_rate': 1.948758814796064e-05, 'epoch': 0.13} {'loss': 0.876, 'learning_rate': 1.9485617351971827e-05, 'epoch': 0.13} {'loss': 0.3079, 'learning_rate': 1.9483642873339753e-05, 'epoch': 0.13} {'loss': 0.7915, 'learning_rate': 1.9481664712830987e-05, 'epoch': 0.13} {'loss': 0.8496, 'learning_rate': 1.9479682871213515e-05, 'epoch': 0.13} {'loss': 0.2825, 'learning_rate': 1.9477697349256756e-05, 'epoch': 0.13} {'loss': 0.8706, 'learning_rate': 1.947570814773156e-05, 'epoch': 0.13} {'loss': 0.8364, 'learning_rate': 1.9473715267410206e-05, 'epoch': 0.13} {'loss': 0.261, 'learning_rate': 1.9471718709066392e-05, 'epoch': 0.13} {'loss': 0.9009, 'learning_rate': 1.9469718473475256e-05, 'epoch': 0.13} {'loss': 0.8984, 'learning_rate': 1.9467714561413358e-05, 'epoch': 0.13} {'loss': 0.9106, 'learning_rate': 1.9465706973658683e-05, 'epoch': 0.13} {'loss': 0.8994, 'learning_rate': 1.9463695710990648e-05, 'epoch': 0.13} {'loss': 0.8628, 'learning_rate': 1.946168077419009e-05, 'epoch': 0.13} {'loss': 0.8579, 'learning_rate': 1.9459662164039283e-05, 'epoch': 0.13} {'loss': 0.8755, 'learning_rate': 1.9457639881321917e-05, 'epoch': 0.13} {'loss': 0.877, 'learning_rate': 1.9455613926823115e-05, 'epoch': 0.13} {'loss': 0.8862, 'learning_rate': 1.945358430132942e-05, 'epoch': 0.13} {'loss': 0.8672, 'learning_rate': 1.9451551005628803e-05, 'epoch': 0.13} {'loss': 0.8965, 'learning_rate': 1.9449514040510654e-05, 'epoch': 0.13} {'loss': 0.261, 'learning_rate': 1.9447473406765803e-05, 'epoch': 0.13} {'loss': 0.8501, 'learning_rate': 1.9445429105186487e-05, 'epoch': 0.13} {'loss': 0.8257, 'learning_rate': 1.9443381136566382e-05, 'epoch': 0.13} {'loss': 0.8423, 'learning_rate': 1.9441329501700568e-05, 'epoch': 0.13} {'loss': 0.8374, 'learning_rate': 1.943927420138557e-05, 'epoch': 0.13} {'loss': 0.8364, 'learning_rate': 1.9437215236419322e-05, 'epoch': 0.13} {'loss': 0.8696, 'learning_rate': 1.9435152607601187e-05, 'epoch': 0.13} {'loss': 0.854, 'learning_rate': 1.943308631573195e-05, 'epoch': 0.13} {'loss': 0.8491, 'learning_rate': 1.9431016361613816e-05, 'epoch': 0.13} {'loss': 0.8521, 'learning_rate': 1.9428942746050406e-05, 'epoch': 0.13} {'loss': 0.8721, 'learning_rate': 1.9426865469846773e-05, 'epoch': 0.14} {'loss': 0.2294, 'learning_rate': 1.9424784533809393e-05, 'epoch': 0.14} {'loss': 0.8818, 'learning_rate': 1.942269993874615e-05, 'epoch': 0.14} {'loss': 0.8667, 'learning_rate': 1.9420611685466358e-05, 'epoch': 0.14} {'loss': 0.9053, 'learning_rate': 1.9418519774780748e-05, 'epoch': 0.14} {'loss': 0.8252, 'learning_rate': 1.9416424207501474e-05, 'epoch': 0.14} {'loss': 0.8462, 'learning_rate': 1.9414324984442102e-05, 'epoch': 0.14} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/081182568X.jpg' {'loss': 0.8296, 'learning_rate': 1.9412222106417632e-05, 'epoch': 0.14} [2024-01-30 19:20:24,712] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8193, 'learning_rate': 1.9410115574244462e-05, 'epoch': 0.14} [2024-01-30 19:20:51,106] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8848, 'learning_rate': 1.9408005388740433e-05, 'epoch': 0.14} {'loss': 0.8525, 'learning_rate': 1.9405891550724778e-05, 'epoch': 0.14} {'loss': 0.8931, 'learning_rate': 1.940377406101817e-05, 'epoch': 0.14} {'loss': 0.8789, 'learning_rate': 1.9401652920442694e-05, 'epoch': 0.14} {'loss': 0.8555, 'learning_rate': 1.9399528129821842e-05, 'epoch': 0.14} {'loss': 0.937, 'learning_rate': 1.939739968998054e-05, 'epoch': 0.14} {'loss': 0.8696, 'learning_rate': 1.939526760174511e-05, 'epoch': 0.14} {'loss': 0.9126, 'learning_rate': 1.939313186594331e-05, 'epoch': 0.14} {'loss': 0.9048, 'learning_rate': 1.9390992483404308e-05, 'epoch': 0.14} {'loss': 0.8101, 'learning_rate': 1.938884945495868e-05, 'epoch': 0.14} {'loss': 0.8647, 'learning_rate': 1.9386702781438425e-05, 'epoch': 0.14} [2024-01-30 19:24:16,103] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8716, 'learning_rate': 1.938455246367696e-05, 'epoch': 0.14} {'loss': 0.9019, 'learning_rate': 1.9382398502509107e-05, 'epoch': 0.14} {'loss': 0.8423, 'learning_rate': 1.938024089877111e-05, 'epoch': 0.14} {'loss': 0.8496, 'learning_rate': 1.9378079653300624e-05, 'epoch': 0.14} {'loss': 0.8618, 'learning_rate': 1.9375914766936723e-05, 'epoch': 0.14} {'loss': 0.854, 'learning_rate': 1.9373746240519884e-05, 'epoch': 0.14} {'loss': 0.8843, 'learning_rate': 1.937157407489201e-05, 'epoch': 0.14} {'loss': 0.8833, 'learning_rate': 1.9369398270896403e-05, 'epoch': 0.14} {'loss': 0.854, 'learning_rate': 1.936721882937779e-05, 'epoch': 0.14} {'loss': 0.2396, 'learning_rate': 1.9365035751182307e-05, 'epoch': 0.14} {'loss': 0.8599, 'learning_rate': 1.93628490371575e-05, 'epoch': 0.14} {'loss': 0.8594, 'learning_rate': 1.9360658688152322e-05, 'epoch': 0.14} {'loss': 0.8774, 'learning_rate': 1.9358464705017143e-05, 'epoch': 0.14} {'loss': 0.8276, 'learning_rate': 1.9356267088603745e-05, 'epoch': 0.14} {'loss': 0.8838, 'learning_rate': 1.9354065839765316e-05, 'epoch': 0.14} {'loss': 0.8135, 'learning_rate': 1.9351860959356462e-05, 'epoch': 0.14} {'loss': 0.8726, 'learning_rate': 1.9349652448233187e-05, 'epoch': 0.14} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1566866391.jpg' {'loss': 0.8311, 'learning_rate': 1.934744030725291e-05, 'epoch': 0.14} {'loss': 0.8599, 'learning_rate': 1.934522453727447e-05, 'epoch': 0.14} {'loss': 0.8374, 'learning_rate': 1.93430051391581e-05, 'epoch': 0.14} {'loss': 0.8398, 'learning_rate': 1.934078211376544e-05, 'epoch': 0.14} {'loss': 0.8564, 'learning_rate': 1.9338555461959554e-05, 'epoch': 0.14} {'loss': 0.8555, 'learning_rate': 1.93363251846049e-05, 'epoch': 0.14} {'loss': 0.8745, 'learning_rate': 1.9334091282567352e-05, 'epoch': 0.14} {'loss': 0.8501, 'learning_rate': 1.9331853756714185e-05, 'epoch': 0.14} {'loss': 0.8799, 'learning_rate': 1.9329612607914088e-05, 'epoch': 0.14} {'loss': 0.8599, 'learning_rate': 1.9327367837037142e-05, 'epoch': 0.14} {'loss': 0.252, 'learning_rate': 1.9325119444954855e-05, 'epoch': 0.14} {'loss': 0.2444, 'learning_rate': 1.9322867432540126e-05, 'epoch': 0.14} {'loss': 0.8745, 'learning_rate': 1.9320611800667268e-05, 'epoch': 0.14} {'loss': 0.9053, 'learning_rate': 1.9318352550211986e-05, 'epoch': 0.14} {'loss': 0.8765, 'learning_rate': 1.9316089682051403e-05, 'epoch': 0.14} {'loss': 0.8203, 'learning_rate': 1.9313823197064042e-05, 'epoch': 0.15} {'loss': 0.8545, 'learning_rate': 1.9311553096129835e-05, 'epoch': 0.15} {'loss': 0.855, 'learning_rate': 1.9309279380130112e-05, 'epoch': 0.15} {'loss': 0.8638, 'learning_rate': 1.93070020499476e-05, 'epoch': 0.15} {'loss': 0.9248, 'learning_rate': 1.930472110646645e-05, 'epoch': 0.15} {'loss': 0.9219, 'learning_rate': 1.9302436550572187e-05, 'epoch': 0.15} {'loss': 0.8638, 'learning_rate': 1.930014838315177e-05, 'epoch': 0.15} {'loss': 0.874, 'learning_rate': 1.9297856605093534e-05, 'epoch': 0.15} {'loss': 0.8359, 'learning_rate': 1.9295561217287226e-05, 'epoch': 0.15} {'loss': 0.833, 'learning_rate': 1.9293262220624002e-05, 'epoch': 0.15} {'loss': 0.9019, 'learning_rate': 1.9290959615996407e-05, 'epoch': 0.15} {'loss': 0.8481, 'learning_rate': 1.9288653404298392e-05, 'epoch': 0.15} {'loss': 0.8662, 'learning_rate': 1.9286343586425307e-05, 'epoch': 0.15} {'loss': 0.8174, 'learning_rate': 1.9284030163273907e-05, 'epoch': 0.15} {'loss': 0.8628, 'learning_rate': 1.9281713135742333e-05, 'epoch': 0.15} {'loss': 0.8618, 'learning_rate': 1.9279392504730147e-05, 'epoch': 0.15} {'loss': 0.8521, 'learning_rate': 1.9277068271138287e-05, 'epoch': 0.15} {'loss': 0.9238, 'learning_rate': 1.9274740435869107e-05, 'epoch': 0.15} {'loss': 0.8467, 'learning_rate': 1.927240899982635e-05, 'epoch': 0.15} {'loss': 0.8418, 'learning_rate': 1.9270073963915162e-05, 'epoch': 0.15} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1572240466.jpg' {'loss': 0.8691, 'learning_rate': 1.9267735329042086e-05, 'epoch': 0.15} {'loss': 0.8848, 'learning_rate': 1.9265393096115056e-05, 'epoch': 0.15} {'loss': 0.8311, 'learning_rate': 1.926304726604341e-05, 'epoch': 0.15} {'loss': 0.834, 'learning_rate': 1.9260697839737875e-05, 'epoch': 0.15} {'loss': 0.8691, 'learning_rate': 1.925834481811059e-05, 'epoch': 0.15} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/763115509.jpg' {'loss': 0.8022, 'learning_rate': 1.9255988202075065e-05, 'epoch': 0.15} {'loss': 0.8843, 'learning_rate': 1.925362799254623e-05, 'epoch': 0.15} {'loss': 0.8262, 'learning_rate': 1.9251264190440398e-05, 'epoch': 0.15} {'loss': 0.8623, 'learning_rate': 1.9248896796675277e-05, 'epoch': 0.15} {'loss': 0.8013, 'learning_rate': 1.924652581216997e-05, 'epoch': 0.15} {'loss': 0.8687, 'learning_rate': 1.9244151237844975e-05, 'epoch': 0.15} {'loss': 0.8398, 'learning_rate': 1.9241773074622182e-05, 'epoch': 0.15} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/812903390.jpg' {'loss': 0.8364, 'learning_rate': 1.923939132342488e-05, 'epoch': 0.15} {'loss': 0.8799, 'learning_rate': 1.923700598517775e-05, 'epoch': 0.15} {'loss': 0.7476, 'learning_rate': 1.923461706080685e-05, 'epoch': 0.15} [2024-01-30 19:44:38,265] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8086, 'learning_rate': 1.923222455123965e-05, 'epoch': 0.15} [2024-01-30 19:44:58,589] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2766, 'learning_rate': 1.9229828457405005e-05, 'epoch': 0.15} {'loss': 0.8779, 'learning_rate': 1.9227428780233162e-05, 'epoch': 0.15} [2024-01-30 19:45:34,731] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2823, 'learning_rate': 1.922502552065576e-05, 'epoch': 0.15} [2024-01-30 19:45:52,294] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8955, 'learning_rate': 1.922261867960582e-05, 'epoch': 0.15} {'loss': 0.2386, 'learning_rate': 1.9220208258017763e-05, 'epoch': 0.15} {'loss': 0.7917, 'learning_rate': 1.92177942568274e-05, 'epoch': 0.15} {'loss': 0.8647, 'learning_rate': 1.921537667697193e-05, 'epoch': 0.15} {'loss': 0.9014, 'learning_rate': 1.9212955519389938e-05, 'epoch': 0.15} {'loss': 0.8638, 'learning_rate': 1.9210530785021405e-05, 'epoch': 0.15} {'loss': 0.8877, 'learning_rate': 1.9208102474807692e-05, 'epoch': 0.15} {'loss': 0.8174, 'learning_rate': 1.920567058969155e-05, 'epoch': 0.15} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/968297072.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/125476604.jpg' {'loss': 0.8789, 'learning_rate': 1.920323513061713e-05, 'epoch': 0.15} {'loss': 0.8335, 'learning_rate': 1.9200796098529956e-05, 'epoch': 0.15} {'loss': 0.7998, 'learning_rate': 1.919835349437694e-05, 'epoch': 0.15} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/811820580.jpg' {'loss': 0.8354, 'learning_rate': 1.9195907319106394e-05, 'epoch': 0.15} {'loss': 0.8472, 'learning_rate': 1.9193457573667996e-05, 'epoch': 0.15} {'loss': 0.8135, 'learning_rate': 1.919100425901283e-05, 'epoch': 0.16} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/3575228701.jpg' {'loss': 0.7842, 'learning_rate': 1.9188547376093355e-05, 'epoch': 0.16} {'loss': 0.9043, 'learning_rate': 1.918608692586342e-05, 'epoch': 0.16} {'loss': 0.8467, 'learning_rate': 1.918362290927825e-05, 'epoch': 0.16} {'loss': 0.8398, 'learning_rate': 1.9181155327294468e-05, 'epoch': 0.16} {'loss': 0.8477, 'learning_rate': 1.9178684180870072e-05, 'epoch': 0.16} {'loss': 0.834, 'learning_rate': 1.9176209470964446e-05, 'epoch': 0.16} {'loss': 0.8423, 'learning_rate': 1.9173731198538354e-05, 'epoch': 0.16} {'loss': 0.8047, 'learning_rate': 1.9171249364553956e-05, 'epoch': 0.16} {'loss': 0.8691, 'learning_rate': 1.9168763969974773e-05, 'epoch': 0.16} {'loss': 0.8657, 'learning_rate': 1.916627501576573e-05, 'epoch': 0.16} {'loss': 0.8193, 'learning_rate': 1.916378250289312e-05, 'epoch': 0.16} {'loss': 0.895, 'learning_rate': 1.9161286432324628e-05, 'epoch': 0.16} {'loss': 0.8477, 'learning_rate': 1.9158786805029307e-05, 'epoch': 0.16} {'loss': 0.8701, 'learning_rate': 1.9156283621977603e-05, 'epoch': 0.16} {'loss': 0.8359, 'learning_rate': 1.9153776884141336e-05, 'epoch': 0.16} {'loss': 0.8384, 'learning_rate': 1.915126659249371e-05, 'epoch': 0.16} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/134412052.jpg' {'loss': 0.8105, 'learning_rate': 1.9148752748009304e-05, 'epoch': 0.16} {'loss': 0.8794, 'learning_rate': 1.914623535166408e-05, 'epoch': 0.16} {'loss': 0.9043, 'learning_rate': 1.9143714404435382e-05, 'epoch': 0.16} {'loss': 0.8638, 'learning_rate': 1.9141189907301922e-05, 'epoch': 0.16} {'loss': 0.8999, 'learning_rate': 1.9138661861243802e-05, 'epoch': 0.16} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/679761799.jpg' {'loss': 0.8359, 'learning_rate': 1.913613026724249e-05, 'epoch': 0.16} {'loss': 0.8662, 'learning_rate': 1.9133595126280848e-05, 'epoch': 0.16} {'loss': 0.811, 'learning_rate': 1.9131056439343095e-05, 'epoch': 0.16} {'loss': 0.8394, 'learning_rate': 1.9128514207414838e-05, 'epoch': 0.16} {'loss': 0.8286, 'learning_rate': 1.9125968431483068e-05, 'epoch': 0.16} {'loss': 0.7998, 'learning_rate': 1.9123419112536132e-05, 'epoch': 0.16} {'loss': 0.8696, 'learning_rate': 1.912086625156377e-05, 'epoch': 0.16} {'loss': 0.9219, 'learning_rate': 1.911830984955709e-05, 'epoch': 0.16} {'loss': 0.8267, 'learning_rate': 1.911574990750857e-05, 'epoch': 0.16} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/156458321X.jpg' {'loss': 0.8652, 'learning_rate': 1.9113186426412073e-05, 'epoch': 0.16} {'loss': 0.8716, 'learning_rate': 1.9110619407262828e-05, 'epoch': 0.16} {'loss': 0.2822, 'learning_rate': 1.9108048851057447e-05, 'epoch': 0.16} {'loss': 0.8877, 'learning_rate': 1.9105474758793897e-05, 'epoch': 0.16} {'loss': 0.8438, 'learning_rate': 1.9102897131471536e-05, 'epoch': 0.16} {'loss': 0.8911, 'learning_rate': 1.9100315970091088e-05, 'epoch': 0.16} {'loss': 0.835, 'learning_rate': 1.9097731275654645e-05, 'epoch': 0.16} {'loss': 0.8472, 'learning_rate': 1.909514304916568e-05, 'epoch': 0.16} {'loss': 0.8569, 'learning_rate': 1.9092551291629026e-05, 'epoch': 0.16} {'loss': 0.8555, 'learning_rate': 1.9089956004050893e-05, 'epoch': 0.16} {'loss': 0.2599, 'learning_rate': 1.908735718743887e-05, 'epoch': 0.16} {'loss': 0.248, 'learning_rate': 1.908475484280189e-05, 'epoch': 0.16} {'loss': 0.9097, 'learning_rate': 1.908214897115029e-05, 'epoch': 0.16} {'loss': 0.8726, 'learning_rate': 1.907953957349575e-05, 'epoch': 0.16} {'loss': 0.8369, 'learning_rate': 1.907692665085133e-05, 'epoch': 0.16} {'loss': 0.8872, 'learning_rate': 1.9074310204231457e-05, 'epoch': 0.16} {'loss': 0.8555, 'learning_rate': 1.9071690234651923e-05, 'epoch': 0.16} {'loss': 0.8882, 'learning_rate': 1.9069066743129893e-05, 'epoch': 0.16} {'loss': 0.75, 'learning_rate': 1.90664397306839e-05, 'epoch': 0.16} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/933821131.jpg' {'loss': 0.8398, 'learning_rate': 1.9063809198333832e-05, 'epoch': 0.16} {'loss': 0.8232, 'learning_rate': 1.9061175147100957e-05, 'epoch': 0.16} {'loss': 0.8481, 'learning_rate': 1.905853757800791e-05, 'epoch': 0.17} {'loss': 0.9014, 'learning_rate': 1.9055896492078675e-05, 'epoch': 0.17} [2024-01-30 20:06:47,875] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.853, 'learning_rate': 1.905325189033862e-05, 'epoch': 0.17} {'loss': 0.8193, 'learning_rate': 1.905060377381447e-05, 'epoch': 0.17} {'loss': 0.8687, 'learning_rate': 1.904795214353431e-05, 'epoch': 0.17} {'loss': 0.9258, 'learning_rate': 1.90452970005276e-05, 'epoch': 0.17} {'loss': 0.8179, 'learning_rate': 1.9042638345825155e-05, 'epoch': 0.17} {'loss': 0.8818, 'learning_rate': 1.9039976180459158e-05, 'epoch': 0.17} {'loss': 0.835, 'learning_rate': 1.9037310505463153e-05, 'epoch': 0.17} {'loss': 0.2413, 'learning_rate': 1.9034641321872043e-05, 'epoch': 0.17} {'loss': 0.8853, 'learning_rate': 1.9031968630722104e-05, 'epoch': 0.17} {'loss': 0.8735, 'learning_rate': 1.902929243305096e-05, 'epoch': 0.17} {'loss': 0.8369, 'learning_rate': 1.902661272989761e-05, 'epoch': 0.17} {'loss': 0.8252, 'learning_rate': 1.9023929522302394e-05, 'epoch': 0.17} {'loss': 0.8379, 'learning_rate': 1.9021242811307044e-05, 'epoch': 0.17} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1886947694.jpg' {'loss': 0.8159, 'learning_rate': 1.901855259795462e-05, 'epoch': 0.17} [2024-01-30 20:11:13,409] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8174, 'learning_rate': 1.9015858883289556e-05, 'epoch': 0.17} [2024-01-30 20:11:32,229] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8452, 'learning_rate': 1.9013161668357655e-05, 'epoch': 0.17} [2024-01-30 20:11:50,687] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8223, 'learning_rate': 1.901046095420606e-05, 'epoch': 0.17} {'loss': 0.811, 'learning_rate': 1.9007756741883284e-05, 'epoch': 0.17} {'loss': 0.8374, 'learning_rate': 1.9005049032439193e-05, 'epoch': 0.17} {'loss': 0.8521, 'learning_rate': 1.9002337826925012e-05, 'epoch': 0.17} {'loss': 0.8525, 'learning_rate': 1.899962312639333e-05, 'epoch': 0.17} {'loss': 0.8296, 'learning_rate': 1.8996904931898085e-05, 'epoch': 0.17} {'loss': 0.2391, 'learning_rate': 1.899418324449457e-05, 'epoch': 0.17} {'loss': 0.8022, 'learning_rate': 1.8991458065239444e-05, 'epoch': 0.17} {'loss': 0.8452, 'learning_rate': 1.8988729395190712e-05, 'epoch': 0.17} {'loss': 0.8564, 'learning_rate': 1.8985997235407735e-05, 'epoch': 0.17} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/879801654.jpg' {'loss': 0.8618, 'learning_rate': 1.898326158695124e-05, 'epoch': 0.17} {'loss': 0.897, 'learning_rate': 1.8980522450883287e-05, 'epoch': 0.17} {'loss': 0.811, 'learning_rate': 1.8977779828267314e-05, 'epoch': 0.17} {'loss': 0.8872, 'learning_rate': 1.8975033720168094e-05, 'epoch': 0.17} {'loss': 0.8027, 'learning_rate': 1.897228412765177e-05, 'epoch': 0.17} {'loss': 0.8525, 'learning_rate': 1.896953105178582e-05, 'epoch': 0.17} {'loss': 0.8604, 'learning_rate': 1.8966774493639084e-05, 'epoch': 0.17} {'loss': 0.8706, 'learning_rate': 1.896401445428176e-05, 'epoch': 0.17} {'loss': 0.9028, 'learning_rate': 1.896125093478538e-05, 'epoch': 0.17} {'loss': 0.7729, 'learning_rate': 1.895848393622284e-05, 'epoch': 0.17} {'loss': 0.8242, 'learning_rate': 1.895571345966839e-05, 'epoch': 0.17} {'loss': 0.8198, 'learning_rate': 1.8952939506197622e-05, 'epoch': 0.17} {'loss': 0.2609, 'learning_rate': 1.8950162076887477e-05, 'epoch': 0.17} {'loss': 0.8379, 'learning_rate': 1.894738117281625e-05, 'epoch': 0.17} {'loss': 0.8291, 'learning_rate': 1.8944596795063584e-05, 'epoch': 0.17} [2024-01-30 20:19:29,669] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8647, 'learning_rate': 1.894180894471047e-05, 'epoch': 0.17} [2024-01-30 20:19:53,450] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8696, 'learning_rate': 1.8939017622839253e-05, 'epoch': 0.17} [2024-01-30 20:20:12,855] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8599, 'learning_rate': 1.8936222830533613e-05, 'epoch': 0.17} {'loss': 0.8423, 'learning_rate': 1.8933424568878586e-05, 'epoch': 0.17} [2024-01-30 20:20:53,567] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8193, 'learning_rate': 1.8930622838960555e-05, 'epoch': 0.17} {'loss': 0.897, 'learning_rate': 1.8927817641867244e-05, 'epoch': 0.17} {'loss': 0.877, 'learning_rate': 1.8925008978687737e-05, 'epoch': 0.17} {'loss': 0.7993, 'learning_rate': 1.8922196850512446e-05, 'epoch': 0.17} [2024-01-30 20:22:03,813] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2872, 'learning_rate': 1.8919381258433135e-05, 'epoch': 0.17} [2024-01-30 20:22:21,677] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8652, 'learning_rate': 1.8916562203542916e-05, 'epoch': 0.18} {'loss': 0.8442, 'learning_rate': 1.8913739686936244e-05, 'epoch': 0.18} [2024-01-30 20:22:59,865] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8506, 'learning_rate': 1.8910913709708918e-05, 'epoch': 0.18} [2024-01-30 20:23:17,814] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8623, 'learning_rate': 1.8908084272958077e-05, 'epoch': 0.18} {'loss': 0.8345, 'learning_rate': 1.8905251377782206e-05, 'epoch': 0.18} {'loss': 0.8174, 'learning_rate': 1.8902415025281136e-05, 'epoch': 0.18} {'loss': 0.8652, 'learning_rate': 1.889957521655603e-05, 'epoch': 0.18} {'loss': 0.834, 'learning_rate': 1.8896731952709408e-05, 'epoch': 0.18} {'loss': 0.8262, 'learning_rate': 1.8893885234845117e-05, 'epoch': 0.18} {'loss': 0.8789, 'learning_rate': 1.8891035064068354e-05, 'epoch': 0.18} {'loss': 0.8291, 'learning_rate': 1.888818144148565e-05, 'epoch': 0.18} {'loss': 0.8467, 'learning_rate': 1.888532436820488e-05, 'epoch': 0.18} {'loss': 0.7822, 'learning_rate': 1.8882463845335263e-05, 'epoch': 0.18} {'loss': 0.793, 'learning_rate': 1.8879599873987343e-05, 'epoch': 0.18} {'loss': 0.8027, 'learning_rate': 1.8876732455273022e-05, 'epoch': 0.18} {'loss': 0.8208, 'learning_rate': 1.8873861590305527e-05, 'epoch': 0.18} {'loss': 0.8696, 'learning_rate': 1.8870987280199428e-05, 'epoch': 0.18} {'loss': 0.2939, 'learning_rate': 1.886810952607063e-05, 'epoch': 0.18} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/078942049X.jpg' {'loss': 0.8789, 'learning_rate': 1.8865228329036372e-05, 'epoch': 0.18} {'loss': 0.8262, 'learning_rate': 1.886234369021524e-05, 'epoch': 0.18} {'loss': 0.8989, 'learning_rate': 1.885945561072715e-05, 'epoch': 0.18} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1580621783.jpg' {'loss': 0.8364, 'learning_rate': 1.885656409169335e-05, 'epoch': 0.18} [2024-01-30 20:30:22,753] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8306, 'learning_rate': 1.885366913423643e-05, 'epoch': 0.18} {'loss': 0.8755, 'learning_rate': 1.8850770739480312e-05, 'epoch': 0.18} {'loss': 0.8765, 'learning_rate': 1.8847868908550252e-05, 'epoch': 0.18} {'loss': 0.7847, 'learning_rate': 1.8844963642572837e-05, 'epoch': 0.18} {'loss': 0.8071, 'learning_rate': 1.8842054942676e-05, 'epoch': 0.18} {'loss': 0.8447, 'learning_rate': 1.8839142809988987e-05, 'epoch': 0.18} {'loss': 0.8115, 'learning_rate': 1.88362272456424e-05, 'epoch': 0.18} {'loss': 0.7866, 'learning_rate': 1.8833308250768153e-05, 'epoch': 0.18} {'loss': 0.8311, 'learning_rate': 1.8830385826499507e-05, 'epoch': 0.18} {'loss': 0.8169, 'learning_rate': 1.882745997397104e-05, 'epoch': 0.18} {'loss': 0.8354, 'learning_rate': 1.8824530694318675e-05, 'epoch': 0.18} {'loss': 0.8281, 'learning_rate': 1.882159798867966e-05, 'epoch': 0.18} {'loss': 0.8325, 'learning_rate': 1.8818661858192562e-05, 'epoch': 0.18} {'loss': 0.9043, 'learning_rate': 1.88157223039973e-05, 'epoch': 0.18} {'loss': 0.854, 'learning_rate': 1.8812779327235106e-05, 'epoch': 0.18} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/840734921.jpg' {'loss': 0.3052, 'learning_rate': 1.880983292904854e-05, 'epoch': 0.18} {'loss': 0.8403, 'learning_rate': 1.88068831105815e-05, 'epoch': 0.18} {'loss': 0.8491, 'learning_rate': 1.8803929872979214e-05, 'epoch': 0.18} {'loss': 0.8496, 'learning_rate': 1.8800973217388215e-05, 'epoch': 0.18} {'loss': 0.833, 'learning_rate': 1.879801314495639e-05, 'epoch': 0.18} {'loss': 0.853, 'learning_rate': 1.879504965683294e-05, 'epoch': 0.18} {'loss': 0.8652, 'learning_rate': 1.8792082754168385e-05, 'epoch': 0.18} {'loss': 0.7935, 'learning_rate': 1.878911243811459e-05, 'epoch': 0.18} {'loss': 0.8027, 'learning_rate': 1.8786138709824726e-05, 'epoch': 0.18} {'loss': 0.7778, 'learning_rate': 1.8783161570453295e-05, 'epoch': 0.18} {'loss': 0.8096, 'learning_rate': 1.878018102115614e-05, 'epoch': 0.18} {'loss': 0.8506, 'learning_rate': 1.8777197063090394e-05, 'epoch': 0.18} {'loss': 0.812, 'learning_rate': 1.877420969741454e-05, 'epoch': 0.18} {'loss': 0.8257, 'learning_rate': 1.877121892528838e-05, 'epoch': 0.18} {'loss': 0.8423, 'learning_rate': 1.876822474787303e-05, 'epoch': 0.18} {'loss': 0.8374, 'learning_rate': 1.8765227166330933e-05, 'epoch': 0.19} {'loss': 0.8618, 'learning_rate': 1.8762226181825857e-05, 'epoch': 0.19} {'loss': 0.8574, 'learning_rate': 1.875922179552288e-05, 'epoch': 0.19} {'loss': 0.8926, 'learning_rate': 1.875621400858842e-05, 'epoch': 0.19} [2024-01-30 20:40:55,917] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8774, 'learning_rate': 1.875320282219019e-05, 'epoch': 0.19} {'loss': 0.8291, 'learning_rate': 1.8750188237497247e-05, 'epoch': 0.19} [2024-01-30 20:41:30,657] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8545, 'learning_rate': 1.874717025567995e-05, 'epoch': 0.19} {'loss': 0.7734, 'learning_rate': 1.874414887790999e-05, 'epoch': 0.19} [2024-01-30 20:42:05,257] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.873, 'learning_rate': 1.8741124105360363e-05, 'epoch': 0.19} {'loss': 0.8813, 'learning_rate': 1.873809593920539e-05, 'epoch': 0.19} [2024-01-30 20:42:41,215] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7769, 'learning_rate': 1.8735064380620717e-05, 'epoch': 0.19} [2024-01-30 20:42:58,970] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8735, 'learning_rate': 1.873202943078329e-05, 'epoch': 0.19} {'loss': 0.257, 'learning_rate': 1.8728991090871387e-05, 'epoch': 0.19} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1557987203.jpg' {'loss': 0.8394, 'learning_rate': 1.8725949362064596e-05, 'epoch': 0.19} {'loss': 0.8335, 'learning_rate': 1.8722904245543817e-05, 'epoch': 0.19} {'loss': 0.9404, 'learning_rate': 1.871985574249127e-05, 'epoch': 0.19} {'loss': 0.8452, 'learning_rate': 1.8716803854090495e-05, 'epoch': 0.19} [2024-01-30 20:44:53,191] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8169, 'learning_rate': 1.8713748581526334e-05, 'epoch': 0.19} [2024-01-30 20:45:10,601] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8154, 'learning_rate': 1.871068992598495e-05, 'epoch': 0.19} {'loss': 0.7939, 'learning_rate': 1.8707627888653816e-05, 'epoch': 0.19} {'loss': 0.833, 'learning_rate': 1.8704562470721728e-05, 'epoch': 0.19} {'loss': 0.8281, 'learning_rate': 1.870149367337878e-05, 'epoch': 0.19} {'loss': 0.8149, 'learning_rate': 1.8698421497816386e-05, 'epoch': 0.19} {'loss': 0.8584, 'learning_rate': 1.869534594522727e-05, 'epoch': 0.19} {'loss': 0.8579, 'learning_rate': 1.8692267016805473e-05, 'epoch': 0.19} {'loss': 0.8848, 'learning_rate': 1.8689184713746333e-05, 'epoch': 0.19} {'loss': 0.8926, 'learning_rate': 1.868609903724651e-05, 'epoch': 0.19} {'loss': 0.8281, 'learning_rate': 1.8683009988503972e-05, 'epoch': 0.19} {'loss': 0.8599, 'learning_rate': 1.867991756871799e-05, 'epoch': 0.19} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/471550833.jpg' {'loss': 0.8438, 'learning_rate': 1.867682177908915e-05, 'epoch': 0.19} {'loss': 0.9111, 'learning_rate': 1.867372262081934e-05, 'epoch': 0.19} {'loss': 0.8052, 'learning_rate': 1.8670620095111766e-05, 'epoch': 0.19} {'loss': 0.856, 'learning_rate': 1.8667514203170934e-05, 'epoch': 0.19} {'loss': 0.7539, 'learning_rate': 1.8664404946202658e-05, 'epoch': 0.19} {'loss': 0.8965, 'learning_rate': 1.8661292325414058e-05, 'epoch': 0.19} {'loss': 0.8022, 'learning_rate': 1.865817634201356e-05, 'epoch': 0.19} {'loss': 0.8384, 'learning_rate': 1.8655056997210893e-05, 'epoch': 0.19} {'loss': 0.8057, 'learning_rate': 1.8651934292217097e-05, 'epoch': 0.19} {'loss': 0.793, 'learning_rate': 1.864880822824452e-05, 'epoch': 0.19} {'loss': 0.812, 'learning_rate': 1.8645678806506795e-05, 'epoch': 0.19} {'loss': 0.8042, 'learning_rate': 1.864254602821888e-05, 'epoch': 0.19} {'loss': 0.2721, 'learning_rate': 1.8639409894597026e-05, 'epoch': 0.19} {'loss': 0.873, 'learning_rate': 1.8636270406858786e-05, 'epoch': 0.19} {'loss': 0.7871, 'learning_rate': 1.8633127566223023e-05, 'epoch': 0.19} {'loss': 0.8613, 'learning_rate': 1.862998137390989e-05, 'epoch': 0.19} {'loss': 0.8882, 'learning_rate': 1.8626831831140845e-05, 'epoch': 0.19} {'loss': 0.8428, 'learning_rate': 1.8623678939138652e-05, 'epoch': 0.19} {'loss': 0.8438, 'learning_rate': 1.8620522699127374e-05, 'epoch': 0.19} {'loss': 0.894, 'learning_rate': 1.8617363112332376e-05, 'epoch': 0.19} {'loss': 0.8311, 'learning_rate': 1.8614200179980307e-05, 'epoch': 0.19} {'loss': 0.8618, 'learning_rate': 1.8611033903299136e-05, 'epoch': 0.19} {'loss': 0.8579, 'learning_rate': 1.8607864283518116e-05, 'epoch': 0.19} {'loss': 0.309, 'learning_rate': 1.8604691321867804e-05, 'epoch': 0.2} {'loss': 0.7935, 'learning_rate': 1.8601515019580053e-05, 'epoch': 0.2} {'loss': 0.8828, 'learning_rate': 1.8598335377888012e-05, 'epoch': 0.2} {'loss': 0.8408, 'learning_rate': 1.8595152398026128e-05, 'epoch': 0.2} {'loss': 0.8672, 'learning_rate': 1.8591966081230142e-05, 'epoch': 0.2} {'loss': 0.8394, 'learning_rate': 1.8588776428737095e-05, 'epoch': 0.2} {'loss': 0.8569, 'learning_rate': 1.858558344178532e-05, 'epoch': 0.2} {'loss': 0.8599, 'learning_rate': 1.8582387121614437e-05, 'epoch': 0.2} {'loss': 0.8921, 'learning_rate': 1.857918746946538e-05, 'epoch': 0.2} {'loss': 0.8369, 'learning_rate': 1.8575984486580353e-05, 'epoch': 0.2} {'loss': 0.875, 'learning_rate': 1.857277817420287e-05, 'epoch': 0.2} {'loss': 0.8535, 'learning_rate': 1.8569568533577727e-05, 'epoch': 0.2} {'loss': 0.896, 'learning_rate': 1.8566355565951023e-05, 'epoch': 0.2} {'loss': 0.814, 'learning_rate': 1.8563139272570142e-05, 'epoch': 0.2} [2024-01-30 21:02:45,918] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8574, 'learning_rate': 1.8559919654683756e-05, 'epoch': 0.2} {'loss': 0.916, 'learning_rate': 1.8556696713541833e-05, 'epoch': 0.2} [2024-01-30 21:03:20,602] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.874, 'learning_rate': 1.855347045039563e-05, 'epoch': 0.2} {'loss': 0.8652, 'learning_rate': 1.8550240866497697e-05, 'epoch': 0.2} [2024-01-30 21:03:57,343] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8394, 'learning_rate': 1.854700796310186e-05, 'epoch': 0.2} {'loss': 0.8804, 'learning_rate': 1.8543771741463254e-05, 'epoch': 0.2} {'loss': 0.9053, 'learning_rate': 1.8540532202838286e-05, 'epoch': 0.2} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/688151175.jpg' {'loss': 0.9048, 'learning_rate': 1.8537289348484658e-05, 'epoch': 0.2} {'loss': 0.8267, 'learning_rate': 1.8534043179661357e-05, 'epoch': 0.2} {'loss': 0.8364, 'learning_rate': 1.8530793697628658e-05, 'epoch': 0.2} {'loss': 0.7915, 'learning_rate': 1.8527540903648122e-05, 'epoch': 0.2} {'loss': 0.8662, 'learning_rate': 1.8524284798982595e-05, 'epoch': 0.2} {'loss': 0.8491, 'learning_rate': 1.852102538489621e-05, 'epoch': 0.2} {'loss': 0.8579, 'learning_rate': 1.8517762662654383e-05, 'epoch': 0.2} {'loss': 0.9414, 'learning_rate': 1.851449663352381e-05, 'epoch': 0.2} {'loss': 0.8477, 'learning_rate': 1.851122729877249e-05, 'epoch': 0.2} {'loss': 0.8125, 'learning_rate': 1.8507954659669677e-05, 'epoch': 0.2} {'loss': 0.8027, 'learning_rate': 1.850467871748593e-05, 'epoch': 0.2} {'loss': 0.7886, 'learning_rate': 1.850139947349308e-05, 'epoch': 0.2} {'loss': 0.8877, 'learning_rate': 1.8498116928964244e-05, 'epoch': 0.2} {'loss': 0.2549, 'learning_rate': 1.849483108517381e-05, 'epoch': 0.2} {'loss': 0.772, 'learning_rate': 1.849154194339747e-05, 'epoch': 0.2} {'loss': 0.8262, 'learning_rate': 1.8488249504912173e-05, 'epoch': 0.2} {'loss': 0.8296, 'learning_rate': 1.8484953770996163e-05, 'epoch': 0.2} {'loss': 0.8882, 'learning_rate': 1.848165474292895e-05, 'epoch': 0.2} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1564965112.jpg' {'loss': 0.8843, 'learning_rate': 1.8478352421991334e-05, 'epoch': 0.2} [2024-01-30 21:10:42,057] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8223, 'learning_rate': 1.847504680946539e-05, 'epoch': 0.2} [2024-01-30 21:11:01,560] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2511, 'learning_rate': 1.847173790663447e-05, 'epoch': 0.2} {'loss': 0.8794, 'learning_rate': 1.8468425714783206e-05, 'epoch': 0.2} {'loss': 0.7979, 'learning_rate': 1.84651102351975e-05, 'epoch': 0.2} {'loss': 0.8574, 'learning_rate': 1.846179146916454e-05, 'epoch': 0.2} [2024-01-30 21:12:23,776] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8687, 'learning_rate': 1.8458469417972783e-05, 'epoch': 0.2} [2024-01-30 21:12:41,254] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8442, 'learning_rate': 1.8455144082911965e-05, 'epoch': 0.2} {'loss': 0.8726, 'learning_rate': 1.8451815465273097e-05, 'epoch': 0.2} {'loss': 0.8652, 'learning_rate': 1.8448483566348456e-05, 'epoch': 0.2} {'loss': 0.8838, 'learning_rate': 1.8445148387431605e-05, 'epoch': 0.2} {'loss': 0.2697, 'learning_rate': 1.8441809929817382e-05, 'epoch': 0.2} {'loss': 0.2742, 'learning_rate': 1.8438468194801876e-05, 'epoch': 0.2} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/3928819232.jpg' {'loss': 0.8433, 'learning_rate': 1.8435123183682475e-05, 'epoch': 0.21} {'loss': 0.8721, 'learning_rate': 1.8431774897757824e-05, 'epoch': 0.21} {'loss': 0.8228, 'learning_rate': 1.8428423338327847e-05, 'epoch': 0.21} {'loss': 0.8335, 'learning_rate': 1.8425068506693727e-05, 'epoch': 0.21} {'loss': 0.8159, 'learning_rate': 1.842171040415793e-05, 'epoch': 0.21} {'loss': 0.8169, 'learning_rate': 1.8418349032024185e-05, 'epoch': 0.21} {'loss': 0.8403, 'learning_rate': 1.8414984391597492e-05, 'epoch': 0.21} [2024-01-30 21:16:40,323] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7764, 'learning_rate': 1.8411616484184126e-05, 'epoch': 0.21} [2024-01-30 21:16:57,540] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8564, 'learning_rate': 1.8408245311091618e-05, 'epoch': 0.21} {'loss': 0.8267, 'learning_rate': 1.8404870873628774e-05, 'epoch': 0.21} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1571971459.jpg' {'loss': 0.8613, 'learning_rate': 1.8401493173105675e-05, 'epoch': 0.21} {'loss': 0.7925, 'learning_rate': 1.8398112210833648e-05, 'epoch': 0.21} {'loss': 0.9019, 'learning_rate': 1.8394727988125308e-05, 'epoch': 0.21} {'loss': 0.8105, 'learning_rate': 1.8391340506294524e-05, 'epoch': 0.21} {'loss': 0.8687, 'learning_rate': 1.8387949766656434e-05, 'epoch': 0.21} {'loss': 0.8535, 'learning_rate': 1.8384555770527438e-05, 'epoch': 0.21} {'loss': 0.8716, 'learning_rate': 1.8381158519225204e-05, 'epoch': 0.21} {'loss': 0.8486, 'learning_rate': 1.8377758014068662e-05, 'epoch': 0.21} {'loss': 0.8369, 'learning_rate': 1.8374354256378e-05, 'epoch': 0.21} {'loss': 0.8115, 'learning_rate': 1.837094724747468e-05, 'epoch': 0.21} {'loss': 0.8271, 'learning_rate': 1.8367536988681422e-05, 'epoch': 0.21} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B001KBBD4A.jpg' {'loss': 0.8325, 'learning_rate': 1.83641234813222e-05, 'epoch': 0.21} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/771573936.jpg' {'loss': 0.8779, 'learning_rate': 1.8360706726722253e-05, 'epoch': 0.21} {'loss': 0.8438, 'learning_rate': 1.835728672620809e-05, 'epoch': 0.21} {'loss': 0.7725, 'learning_rate': 1.8353863481107473e-05, 'epoch': 0.21} {'loss': 0.8389, 'learning_rate': 1.835043699274942e-05, 'epoch': 0.21} {'loss': 0.8237, 'learning_rate': 1.8347007262464206e-05, 'epoch': 0.21} {'loss': 0.8086, 'learning_rate': 1.8343574291583385e-05, 'epoch': 0.21} {'loss': 0.8682, 'learning_rate': 1.8340138081439743e-05, 'epoch': 0.21} {'loss': 0.8228, 'learning_rate': 1.833669863336734e-05, 'epoch': 0.21} {'loss': 0.8228, 'learning_rate': 1.833325594870148e-05, 'epoch': 0.21} {'loss': 0.7891, 'learning_rate': 1.8329810028778747e-05, 'epoch': 0.21} {'loss': 0.7983, 'learning_rate': 1.8326360874936952e-05, 'epoch': 0.21} {'loss': 0.8442, 'learning_rate': 1.8322908488515182e-05, 'epoch': 0.21} {'loss': 0.8872, 'learning_rate': 1.8319452870853772e-05, 'epoch': 0.21} {'loss': 0.8569, 'learning_rate': 1.8315994023294306e-05, 'epoch': 0.21} {'loss': 0.9082, 'learning_rate': 1.8312531947179634e-05, 'epoch': 0.21} {'loss': 0.8335, 'learning_rate': 1.8309066643853854e-05, 'epoch': 0.21} {'loss': 0.8232, 'learning_rate': 1.8305598114662312e-05, 'epoch': 0.21} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/688118127.jpg' {'loss': 0.8706, 'learning_rate': 1.830212636095161e-05, 'epoch': 0.21} {'loss': 0.8511, 'learning_rate': 1.8298651384069605e-05, 'epoch': 0.21} {'loss': 0.8687, 'learning_rate': 1.8295173185365405e-05, 'epoch': 0.21} {'loss': 0.8418, 'learning_rate': 1.829169176618936e-05, 'epoch': 0.21} {'loss': 0.8872, 'learning_rate': 1.828820712789308e-05, 'epoch': 0.21} [2024-01-30 21:28:16,145] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8525, 'learning_rate': 1.828471927182942e-05, 'epoch': 0.21} [2024-01-30 21:28:39,028] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8057, 'learning_rate': 1.828122819935249e-05, 'epoch': 0.21} {'loss': 0.8477, 'learning_rate': 1.8277733911817642e-05, 'epoch': 0.21} {'loss': 0.3014, 'learning_rate': 1.8274236410581478e-05, 'epoch': 0.21} {'loss': 0.8589, 'learning_rate': 1.827073569700185e-05, 'epoch': 0.21} {'loss': 0.8315, 'learning_rate': 1.8267231772437854e-05, 'epoch': 0.21} {'loss': 0.8247, 'learning_rate': 1.8263724638249834e-05, 'epoch': 0.21} {'loss': 0.2534, 'learning_rate': 1.8260214295799382e-05, 'epoch': 0.21} {'loss': 0.2679, 'learning_rate': 1.825670074644933e-05, 'epoch': 0.22} {'loss': 0.8232, 'learning_rate': 1.8253183991563768e-05, 'epoch': 0.22} {'loss': 0.2803, 'learning_rate': 1.824966403250801e-05, 'epoch': 0.22} {'loss': 0.8174, 'learning_rate': 1.8246140870648633e-05, 'epoch': 0.22} {'loss': 0.8945, 'learning_rate': 1.8242614507353446e-05, 'epoch': 0.22} {'loss': 0.8345, 'learning_rate': 1.8239084943991507e-05, 'epoch': 0.22} {'loss': 0.7515, 'learning_rate': 1.823555218193311e-05, 'epoch': 0.22} {'loss': 0.8101, 'learning_rate': 1.8232016222549797e-05, 'epoch': 0.22} {'loss': 0.8486, 'learning_rate': 1.8228477067214352e-05, 'epoch': 0.22} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/958315434.jpg' {'loss': 0.8291, 'learning_rate': 1.8224934717300794e-05, 'epoch': 0.22} {'loss': 0.8301, 'learning_rate': 1.8221389174184385e-05, 'epoch': 0.22} {'loss': 0.7847, 'learning_rate': 1.8217840439241633e-05, 'epoch': 0.22} {'loss': 0.8545, 'learning_rate': 1.8214288513850267e-05, 'epoch': 0.22} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/810940183.jpg' {'loss': 0.33, 'learning_rate': 1.8210733399389277e-05, 'epoch': 0.22} {'loss': 0.8115, 'learning_rate': 1.820717509723888e-05, 'epoch': 0.22} {'loss': 0.2747, 'learning_rate': 1.8203613608780525e-05, 'epoch': 0.22} {'loss': 0.8843, 'learning_rate': 1.8200048935396908e-05, 'epoch': 0.22} [2024-01-30 21:36:09,390] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7671, 'learning_rate': 1.819648107847196e-05, 'epoch': 0.22} [2024-01-30 21:36:32,189] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8477, 'learning_rate': 1.8192910039390844e-05, 'epoch': 0.22} [2024-01-30 21:36:50,715] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8364, 'learning_rate': 1.8189335819539963e-05, 'epoch': 0.22} {'loss': 0.7969, 'learning_rate': 1.8185758420306947e-05, 'epoch': 0.22} [2024-01-30 21:37:25,131] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8008, 'learning_rate': 1.818217784308067e-05, 'epoch': 0.22} {'loss': 0.8628, 'learning_rate': 1.817859408925123e-05, 'epoch': 0.22} {'loss': 0.8477, 'learning_rate': 1.817500716020997e-05, 'epoch': 0.22} {'loss': 0.8413, 'learning_rate': 1.8171417057349457e-05, 'epoch': 0.22} {'loss': 0.8462, 'learning_rate': 1.816782378206349e-05, 'epoch': 0.22} {'loss': 0.29, 'learning_rate': 1.8164227335747108e-05, 'epoch': 0.22} {'loss': 0.7964, 'learning_rate': 1.8160627719796568e-05, 'epoch': 0.22} {'loss': 0.8623, 'learning_rate': 1.815702493560937e-05, 'epoch': 0.22} {'loss': 0.8501, 'learning_rate': 1.8153418984584238e-05, 'epoch': 0.22} {'loss': 0.8228, 'learning_rate': 1.8149809868121125e-05, 'epoch': 0.22} {'loss': 0.8047, 'learning_rate': 1.8146197587621217e-05, 'epoch': 0.22} {'loss': 0.2888, 'learning_rate': 1.814258214448692e-05, 'epoch': 0.22} {'loss': 0.8599, 'learning_rate': 1.8138963540121878e-05, 'epoch': 0.22} {'loss': 0.7749, 'learning_rate': 1.813534177593096e-05, 'epoch': 0.22} {'loss': 0.2903, 'learning_rate': 1.8131716853320254e-05, 'epoch': 0.22} {'loss': 0.8916, 'learning_rate': 1.8128088773697086e-05, 'epoch': 0.22} {'loss': 0.8799, 'learning_rate': 1.8124457538469996e-05, 'epoch': 0.22} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1570761493.jpg' {'loss': 0.8442, 'learning_rate': 1.8120823149048753e-05, 'epoch': 0.22} {'loss': 0.833, 'learning_rate': 1.811718560684436e-05, 'epoch': 0.22} {'loss': 0.9014, 'learning_rate': 1.8113544913269025e-05, 'epoch': 0.22} {'loss': 0.8103, 'learning_rate': 1.8109901069736202e-05, 'epoch': 0.22} {'loss': 0.8135, 'learning_rate': 1.8106254077660552e-05, 'epoch': 0.22} {'loss': 0.8623, 'learning_rate': 1.810260393845796e-05, 'epoch': 0.22} {'loss': 0.2528, 'learning_rate': 1.809895065354554e-05, 'epoch': 0.22} {'loss': 0.2594, 'learning_rate': 1.8095294224341622e-05, 'epoch': 0.22} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/892133252.jpg' {'loss': 0.8765, 'learning_rate': 1.8091634652265755e-05, 'epoch': 0.22} {'loss': 0.8374, 'learning_rate': 1.8087971938738715e-05, 'epoch': 0.22} {'loss': 0.8438, 'learning_rate': 1.808430608518249e-05, 'epoch': 0.22} {'loss': 0.8545, 'learning_rate': 1.808063709302029e-05, 'epoch': 0.22} {'loss': 0.7954, 'learning_rate': 1.807696496367655e-05, 'epoch': 0.22} {'loss': 0.8584, 'learning_rate': 1.8073289698576913e-05, 'epoch': 0.22} {'loss': 0.2783, 'learning_rate': 1.8069611299148236e-05, 'epoch': 0.23} {'loss': 0.8726, 'learning_rate': 1.8065929766818617e-05, 'epoch': 0.23} {'loss': 0.8906, 'learning_rate': 1.806224510301734e-05, 'epoch': 0.23} {'loss': 0.7979, 'learning_rate': 1.8058557309174926e-05, 'epoch': 0.23} {'loss': 0.7988, 'learning_rate': 1.8054866386723096e-05, 'epoch': 0.23} {'loss': 0.8018, 'learning_rate': 1.80511723370948e-05, 'epoch': 0.23} {'loss': 0.262, 'learning_rate': 1.804747516172419e-05, 'epoch': 0.23} {'loss': 0.7832, 'learning_rate': 1.8043774862046644e-05, 'epoch': 0.23} {'loss': 0.8354, 'learning_rate': 1.804007143949874e-05, 'epoch': 0.23} {'loss': 0.8311, 'learning_rate': 1.8036364895518272e-05, 'epoch': 0.23} {'loss': 0.8647, 'learning_rate': 1.8032655231544253e-05, 'epoch': 0.23} {'loss': 0.7905, 'learning_rate': 1.8028942449016903e-05, 'epoch': 0.23} {'loss': 0.855, 'learning_rate': 1.8025226549377647e-05, 'epoch': 0.23} {'loss': 0.8716, 'learning_rate': 1.8021507534069133e-05, 'epoch': 0.23} {'loss': 0.8257, 'learning_rate': 1.8017785404535198e-05, 'epoch': 0.23} {'loss': 0.2607, 'learning_rate': 1.8014060162220916e-05, 'epoch': 0.23} {'loss': 0.7954, 'learning_rate': 1.801033180857254e-05, 'epoch': 0.23} {'loss': 0.8818, 'learning_rate': 1.8006600345037558e-05, 'epoch': 0.23} {'loss': 0.7974, 'learning_rate': 1.8002865773064644e-05, 'epoch': 0.23} {'loss': 0.8574, 'learning_rate': 1.799912809410369e-05, 'epoch': 0.23} {'loss': 0.8145, 'learning_rate': 1.799538730960579e-05, 'epoch': 0.23} {'loss': 0.8574, 'learning_rate': 1.799164342102325e-05, 'epoch': 0.23} {'loss': 0.8315, 'learning_rate': 1.7987896429809573e-05, 'epoch': 0.23} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/878576991.jpg' {'loss': 0.8022, 'learning_rate': 1.798414633741947e-05, 'epoch': 0.23} {'loss': 0.8193, 'learning_rate': 1.7980393145308857e-05, 'epoch': 0.23} {'loss': 0.8486, 'learning_rate': 1.797663685493485e-05, 'epoch': 0.23} {'loss': 0.8228, 'learning_rate': 1.7972877467755777e-05, 'epoch': 0.23} {'loss': 0.8345, 'learning_rate': 1.7969114985231152e-05, 'epoch': 0.23} [2024-01-30 21:55:37,624] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8203, 'learning_rate': 1.796534940882171e-05, 'epoch': 0.23} [2024-01-30 21:55:57,461] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8091, 'learning_rate': 1.7961580739989365e-05, 'epoch': 0.23} [2024-01-30 21:56:18,812] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.833, 'learning_rate': 1.795780898019726e-05, 'epoch': 0.23} {'loss': 0.8369, 'learning_rate': 1.795403413090971e-05, 'epoch': 0.23} {'loss': 0.7612, 'learning_rate': 1.7950256193592243e-05, 'epoch': 0.23} {'loss': 0.2673, 'learning_rate': 1.794647516971159e-05, 'epoch': 0.23} [2024-01-30 21:57:40,553] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8491, 'learning_rate': 1.7942691060735666e-05, 'epoch': 0.23} [2024-01-30 21:57:58,692] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.811, 'learning_rate': 1.79389038681336e-05, 'epoch': 0.23} {'loss': 0.876, 'learning_rate': 1.7935113593375707e-05, 'epoch': 0.23} [2024-01-30 21:58:33,339] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8535, 'learning_rate': 1.7931320237933503e-05, 'epoch': 0.23} {'loss': 0.8555, 'learning_rate': 1.79275238032797e-05, 'epoch': 0.23} [2024-01-30 21:59:07,869] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8438, 'learning_rate': 1.7923724290888205e-05, 'epoch': 0.23} {'loss': 0.856, 'learning_rate': 1.791992170223412e-05, 'epoch': 0.23} {'loss': 0.8379, 'learning_rate': 1.791611603879374e-05, 'epoch': 0.23} [2024-01-30 22:00:04,336] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8169, 'learning_rate': 1.791230730204455e-05, 'epoch': 0.23} {'loss': 0.8979, 'learning_rate': 1.7908495493465236e-05, 'epoch': 0.23} [2024-01-30 22:00:43,751] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8062, 'learning_rate': 1.7904680614535675e-05, 'epoch': 0.23} {'loss': 0.7842, 'learning_rate': 1.7900862666736935e-05, 'epoch': 0.23} {'loss': 0.7959, 'learning_rate': 1.789704165155127e-05, 'epoch': 0.23} {'loss': 0.2626, 'learning_rate': 1.7893217570462134e-05, 'epoch': 0.23} {'loss': 0.9072, 'learning_rate': 1.7889390424954168e-05, 'epoch': 0.23} {'loss': 0.7759, 'learning_rate': 1.78855602165132e-05, 'epoch': 0.23} {'loss': 0.8276, 'learning_rate': 1.7881726946626244e-05, 'epoch': 0.23} {'loss': 0.8447, 'learning_rate': 1.787789061678151e-05, 'epoch': 0.23} {'loss': 0.8105, 'learning_rate': 1.78740512284684e-05, 'epoch': 0.24} {'loss': 0.8066, 'learning_rate': 1.787020878317749e-05, 'epoch': 0.24} {'loss': 0.7983, 'learning_rate': 1.7866363282400555e-05, 'epoch': 0.24} {'loss': 0.8145, 'learning_rate': 1.7862514727630543e-05, 'epoch': 0.24} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/870675656.jpg' {'loss': 0.2672, 'learning_rate': 1.7858663120361597e-05, 'epoch': 0.24} [2024-01-30 22:04:44,495] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8013, 'learning_rate': 1.785480846208905e-05, 'epoch': 0.24} [2024-01-30 22:05:04,030] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2493, 'learning_rate': 1.7850950754309405e-05, 'epoch': 0.24} {'loss': 0.8174, 'learning_rate': 1.7847089998520365e-05, 'epoch': 0.24} [2024-01-30 22:05:38,421] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8921, 'learning_rate': 1.7843226196220803e-05, 'epoch': 0.24} {'loss': 0.8877, 'learning_rate': 1.783935934891078e-05, 'epoch': 0.24} {'loss': 0.8135, 'learning_rate': 1.7835489458091544e-05, 'epoch': 0.24} [2024-01-30 22:06:36,326] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8477, 'learning_rate': 1.7831616525265515e-05, 'epoch': 0.24} {'loss': 0.8296, 'learning_rate': 1.7827740551936296e-05, 'epoch': 0.24} {'loss': 0.8364, 'learning_rate': 1.7823861539608686e-05, 'epoch': 0.24} {'loss': 0.8271, 'learning_rate': 1.7819979489788638e-05, 'epoch': 0.24} {'loss': 0.7627, 'learning_rate': 1.7816094403983298e-05, 'epoch': 0.24} {'loss': 0.8105, 'learning_rate': 1.7812206283701002e-05, 'epoch': 0.24} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/780800370.jpg' {'loss': 0.8691, 'learning_rate': 1.7808315130451244e-05, 'epoch': 0.24} {'loss': 0.8428, 'learning_rate': 1.78044209457447e-05, 'epoch': 0.24} {'loss': 0.8525, 'learning_rate': 1.7800523731093232e-05, 'epoch': 0.24} {'loss': 0.7817, 'learning_rate': 1.7796623488009875e-05, 'epoch': 0.24} {'loss': 0.2352, 'learning_rate': 1.7792720218008826e-05, 'epoch': 0.24} {'loss': 0.7988, 'learning_rate': 1.7788813922605488e-05, 'epoch': 0.24} {'loss': 0.8066, 'learning_rate': 1.7784904603316402e-05, 'epoch': 0.24} {'loss': 0.8657, 'learning_rate': 1.7780992261659305e-05, 'epoch': 0.24} {'loss': 0.832, 'learning_rate': 1.777707689915311e-05, 'epoch': 0.24} {'loss': 0.8047, 'learning_rate': 1.777315851731789e-05, 'epoch': 0.24} {'loss': 0.8184, 'learning_rate': 1.7769237117674893e-05, 'epoch': 0.24} {'loss': 0.8584, 'learning_rate': 1.7765312701746543e-05, 'epoch': 0.24} {'loss': 0.8042, 'learning_rate': 1.7761385271056436e-05, 'epoch': 0.24} {'loss': 0.833, 'learning_rate': 1.7757454827129338e-05, 'epoch': 0.24} {'loss': 0.8931, 'learning_rate': 1.7753521371491174e-05, 'epoch': 0.24} {'loss': 0.8003, 'learning_rate': 1.7749584905669057e-05, 'epoch': 0.24} {'loss': 0.8223, 'learning_rate': 1.774564543119125e-05, 'epoch': 0.24} {'loss': 0.2393, 'learning_rate': 1.7741702949587196e-05, 'epoch': 0.24} {'loss': 0.8242, 'learning_rate': 1.7737757462387507e-05, 'epoch': 0.24} {'loss': 0.8359, 'learning_rate': 1.7733808971123946e-05, 'epoch': 0.24} {'loss': 0.8135, 'learning_rate': 1.7729857477329463e-05, 'epoch': 0.24} {'loss': 0.8271, 'learning_rate': 1.7725902982538162e-05, 'epoch': 0.24} {'loss': 0.8359, 'learning_rate': 1.772194548828531e-05, 'epoch': 0.24} {'loss': 0.8506, 'learning_rate': 1.7717984996107346e-05, 'epoch': 0.24} {'loss': 0.8589, 'learning_rate': 1.771402150754187e-05, 'epoch': 0.24} {'loss': 0.8271, 'learning_rate': 1.7710055024127637e-05, 'epoch': 0.24} {'loss': 0.8345, 'learning_rate': 1.7706085547404582e-05, 'epoch': 0.24} {'loss': 0.8218, 'learning_rate': 1.770211307891379e-05, 'epoch': 0.24} {'loss': 0.8413, 'learning_rate': 1.769813762019751e-05, 'epoch': 0.24} {'loss': 0.9023, 'learning_rate': 1.769415917279915e-05, 'epoch': 0.24} {'loss': 0.8345, 'learning_rate': 1.7690177738263284e-05, 'epoch': 0.24} {'loss': 0.7642, 'learning_rate': 1.7686193318135635e-05, 'epoch': 0.24} {'loss': 0.8667, 'learning_rate': 1.76822059139631e-05, 'epoch': 0.24} {'loss': 0.2715, 'learning_rate': 1.7678215527293724e-05, 'epoch': 0.24} {'loss': 0.8403, 'learning_rate': 1.767422215967671e-05, 'epoch': 0.24} {'loss': 0.8037, 'learning_rate': 1.767022581266242e-05, 'epoch': 0.25} {'loss': 0.8115, 'learning_rate': 1.766622648780238e-05, 'epoch': 0.25} {'loss': 0.8433, 'learning_rate': 1.766222418664926e-05, 'epoch': 0.25} {'loss': 0.811, 'learning_rate': 1.765821891075689e-05, 'epoch': 0.25} {'loss': 0.792, 'learning_rate': 1.7654210661680263e-05, 'epoch': 0.25} [2024-01-30 22:21:32,058] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.832, 'learning_rate': 1.765019944097551e-05, 'epoch': 0.25} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1561701289.jpg' {'loss': 0.8188, 'learning_rate': 1.7646185250199936e-05, 'epoch': 0.25} [2024-01-30 22:22:11,537] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8008, 'learning_rate': 1.7642168090911976e-05, 'epoch': 0.25} {'loss': 0.793, 'learning_rate': 1.763814796467124e-05, 'epoch': 0.25} {'loss': 0.8403, 'learning_rate': 1.763412487303847e-05, 'epoch': 0.25} {'loss': 0.8306, 'learning_rate': 1.7630098817575578e-05, 'epoch': 0.25} {'loss': 0.8213, 'learning_rate': 1.762606979984561e-05, 'epoch': 0.25} {'loss': 0.8599, 'learning_rate': 1.7622037821412775e-05, 'epoch': 0.25} {'loss': 0.8384, 'learning_rate': 1.7618002883842426e-05, 'epoch': 0.25} {'loss': 0.8438, 'learning_rate': 1.7613964988701057e-05, 'epoch': 0.25} {'loss': 0.8623, 'learning_rate': 1.7609924137556326e-05, 'epoch': 0.25} {'loss': 0.2872, 'learning_rate': 1.7605880331977022e-05, 'epoch': 0.25} {'loss': 0.8096, 'learning_rate': 1.76018335735331e-05, 'epoch': 0.25} {'loss': 0.8325, 'learning_rate': 1.7597783863795644e-05, 'epoch': 0.25} {'loss': 0.8525, 'learning_rate': 1.7593731204336895e-05, 'epoch': 0.25} {'loss': 0.8896, 'learning_rate': 1.7589675596730233e-05, 'epoch': 0.25} {'loss': 0.2689, 'learning_rate': 1.758561704255018e-05, 'epoch': 0.25} {'loss': 0.8262, 'learning_rate': 1.7581555543372413e-05, 'epoch': 0.25} {'loss': 0.873, 'learning_rate': 1.7577491100773744e-05, 'epoch': 0.25} {'loss': 0.8853, 'learning_rate': 1.7573423716332128e-05, 'epoch': 0.25} {'loss': 0.2725, 'learning_rate': 1.7569353391626665e-05, 'epoch': 0.25} {'loss': 0.854, 'learning_rate': 1.7565280128237595e-05, 'epoch': 0.25} {'loss': 0.7979, 'learning_rate': 1.75612039277463e-05, 'epoch': 0.25} {'loss': 0.2777, 'learning_rate': 1.75571247917353e-05, 'epoch': 0.25} {'loss': 0.8486, 'learning_rate': 1.7553042721788255e-05, 'epoch': 0.25} {'loss': 0.7983, 'learning_rate': 1.754895771948997e-05, 'epoch': 0.25} {'loss': 0.8682, 'learning_rate': 1.754486978642637e-05, 'epoch': 0.25} {'loss': 0.8062, 'learning_rate': 1.7540778924184553e-05, 'epoch': 0.25} {'loss': 0.8452, 'learning_rate': 1.7536685134352717e-05, 'epoch': 0.25} {'loss': 0.8188, 'learning_rate': 1.7532588418520215e-05, 'epoch': 0.25} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/737303360.jpg' {'loss': 0.2744, 'learning_rate': 1.7528488778277535e-05, 'epoch': 0.25} {'loss': 0.895, 'learning_rate': 1.75243862152163e-05, 'epoch': 0.25} {'loss': 0.8208, 'learning_rate': 1.752028073092926e-05, 'epoch': 0.25} {'loss': 0.8491, 'learning_rate': 1.7516172327010314e-05, 'epoch': 0.25} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1881174034.jpg' {'loss': 0.8223, 'learning_rate': 1.751206100505448e-05, 'epoch': 0.25} {'loss': 0.7939, 'learning_rate': 1.7507946766657914e-05, 'epoch': 0.25} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/70359148.jpg' {'loss': 0.8306, 'learning_rate': 1.7503829613417905e-05, 'epoch': 0.25} {'loss': 0.8306, 'learning_rate': 1.749970954693288e-05, 'epoch': 0.25} {'loss': 0.832, 'learning_rate': 1.7495586568802384e-05, 'epoch': 0.25} {'loss': 0.8555, 'learning_rate': 1.7491460680627105e-05, 'epoch': 0.25} {'loss': 0.8364, 'learning_rate': 1.7487331884008845e-05, 'epoch': 0.25} {'loss': 0.8701, 'learning_rate': 1.7483200180550554e-05, 'epoch': 0.25} {'loss': 0.834, 'learning_rate': 1.74790655718563e-05, 'epoch': 0.25} {'loss': 0.8101, 'learning_rate': 1.747492805953128e-05, 'epoch': 0.25} {'loss': 0.8027, 'learning_rate': 1.7470787645181818e-05, 'epoch': 0.25} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1860340660.jpg' {'loss': 0.834, 'learning_rate': 1.7466644330415362e-05, 'epoch': 0.25} {'loss': 0.2521, 'learning_rate': 1.7462498116840496e-05, 'epoch': 0.25} {'loss': 0.792, 'learning_rate': 1.745834900606692e-05, 'epoch': 0.26} {'loss': 0.7925, 'learning_rate': 1.7454196999705458e-05, 'epoch': 0.26} {'loss': 0.7964, 'learning_rate': 1.7450042099368066e-05, 'epoch': 0.26} {'loss': 0.8794, 'learning_rate': 1.7445884306667823e-05, 'epoch': 0.26} {'loss': 0.8213, 'learning_rate': 1.7441723623218917e-05, 'epoch': 0.26} {'loss': 0.7935, 'learning_rate': 1.7437560050636678e-05, 'epoch': 0.26} {'loss': 0.9028, 'learning_rate': 1.7433393590537543e-05, 'epoch': 0.26} {'loss': 0.7949, 'learning_rate': 1.7429224244539077e-05, 'epoch': 0.26} {'loss': 0.8149, 'learning_rate': 1.7425052014259965e-05, 'epoch': 0.26} {'loss': 0.8462, 'learning_rate': 1.7420876901320006e-05, 'epoch': 0.26} {'loss': 0.8232, 'learning_rate': 1.7416698907340128e-05, 'epoch': 0.26} {'loss': 0.8516, 'learning_rate': 1.741251803394237e-05, 'epoch': 0.26} {'loss': 0.8721, 'learning_rate': 1.740833428274989e-05, 'epoch': 0.26} {'loss': 0.8301, 'learning_rate': 1.7404147655386966e-05, 'epoch': 0.26} {'loss': 0.8335, 'learning_rate': 1.739995815347899e-05, 'epoch': 0.26} {'loss': 0.813, 'learning_rate': 1.739576577865247e-05, 'epoch': 0.26} {'loss': 0.8633, 'learning_rate': 1.739157053253503e-05, 'epoch': 0.26} {'loss': 0.8237, 'learning_rate': 1.738737241675541e-05, 'epoch': 0.26} {'loss': 0.8115, 'learning_rate': 1.7383171432943466e-05, 'epoch': 0.26} {'loss': 0.8301, 'learning_rate': 1.737896758273016e-05, 'epoch': 0.26} {'loss': 0.8022, 'learning_rate': 1.7374760867747574e-05, 'epoch': 0.26} {'loss': 0.8672, 'learning_rate': 1.7370551289628895e-05, 'epoch': 0.26} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/471178705.jpg' {'loss': 0.8315, 'learning_rate': 1.7366338850008432e-05, 'epoch': 0.26} {'loss': 0.8179, 'learning_rate': 1.73621235505216e-05, 'epoch': 0.26} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/671567918.jpg' {'loss': 0.7969, 'learning_rate': 1.7357905392804918e-05, 'epoch': 0.26} {'loss': 0.8555, 'learning_rate': 1.735368437849602e-05, 'epoch': 0.26} {'loss': 0.8145, 'learning_rate': 1.7349460509233654e-05, 'epoch': 0.26} {'loss': 0.2485, 'learning_rate': 1.734523378665767e-05, 'epoch': 0.26} {'loss': 0.8027, 'learning_rate': 1.7341004212409026e-05, 'epoch': 0.26} {'loss': 0.8364, 'learning_rate': 1.7336771788129785e-05, 'epoch': 0.26} {'loss': 0.8809, 'learning_rate': 1.7332536515463126e-05, 'epoch': 0.26} {'loss': 0.9038, 'learning_rate': 1.7328298396053324e-05, 'epoch': 0.26} {'loss': 0.8838, 'learning_rate': 1.7324057431545768e-05, 'epoch': 0.26} {'loss': 0.8389, 'learning_rate': 1.7319813623586935e-05, 'epoch': 0.26} {'loss': 0.8579, 'learning_rate': 1.7315566973824433e-05, 'epoch': 0.26} {'loss': 0.8535, 'learning_rate': 1.7311317483906946e-05, 'epoch': 0.26} {'loss': 0.7969, 'learning_rate': 1.730706515548427e-05, 'epoch': 0.26} {'loss': 0.8481, 'learning_rate': 1.730280999020732e-05, 'epoch': 0.26} {'loss': 0.8296, 'learning_rate': 1.729855198972808e-05, 'epoch': 0.26} {'loss': 0.7954, 'learning_rate': 1.729429115569967e-05, 'epoch': 0.26} {'loss': 0.8359, 'learning_rate': 1.729002748977628e-05, 'epoch': 0.26} {'loss': 0.8555, 'learning_rate': 1.7285760993613215e-05, 'epoch': 0.26} {'loss': 0.79, 'learning_rate': 1.7281491668866874e-05, 'epoch': 0.26} {'loss': 0.8706, 'learning_rate': 1.727721951719476e-05, 'epoch': 0.26} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/706377273.jpg' {'loss': 0.8218, 'learning_rate': 1.7272944540255468e-05, 'epoch': 0.26} [2024-01-30 22:49:56,296] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8296, 'learning_rate': 1.726866673970869e-05, 'epoch': 0.26} {'loss': 0.8628, 'learning_rate': 1.7264386117215216e-05, 'epoch': 0.26} {'loss': 0.8584, 'learning_rate': 1.7260102674436933e-05, 'epoch': 0.26} {'loss': 0.7793, 'learning_rate': 1.7255816413036818e-05, 'epoch': 0.26} {'loss': 0.2759, 'learning_rate': 1.7251527334678946e-05, 'epoch': 0.26} {'loss': 0.8872, 'learning_rate': 1.7247235441028486e-05, 'epoch': 0.26} {'loss': 0.2719, 'learning_rate': 1.7242940733751696e-05, 'epoch': 0.26} {'loss': 0.832, 'learning_rate': 1.7238643214515934e-05, 'epoch': 0.27} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/393090027.jpg' {'loss': 0.8291, 'learning_rate': 1.7234342884989642e-05, 'epoch': 0.27} {'loss': 0.873, 'learning_rate': 1.7230039746842352e-05, 'epoch': 0.27} {'loss': 0.8828, 'learning_rate': 1.7225733801744698e-05, 'epoch': 0.27} {'loss': 0.8423, 'learning_rate': 1.7221425051368394e-05, 'epoch': 0.27} {'loss': 0.7651, 'learning_rate': 1.7217113497386245e-05, 'epoch': 0.27} {'loss': 0.8164, 'learning_rate': 1.721279914147214e-05, 'epoch': 0.27} {'loss': 0.8638, 'learning_rate': 1.7208481985301065e-05, 'epoch': 0.27} {'loss': 0.7979, 'learning_rate': 1.7204162030549093e-05, 'epoch': 0.27} {'loss': 0.8228, 'learning_rate': 1.7199839278893368e-05, 'epoch': 0.27} {'loss': 0.834, 'learning_rate': 1.719551373201214e-05, 'epoch': 0.27} {'loss': 0.835, 'learning_rate': 1.7191185391584736e-05, 'epoch': 0.27} {'loss': 0.7817, 'learning_rate': 1.7186854259291558e-05, 'epoch': 0.27} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1576730867.jpg' {'loss': 0.2832, 'learning_rate': 1.7182520336814105e-05, 'epoch': 0.27} {'loss': 0.8105, 'learning_rate': 1.717818362583496e-05, 'epoch': 0.27} {'loss': 0.7754, 'learning_rate': 1.7173844128037777e-05, 'epoch': 0.27} {'loss': 0.8374, 'learning_rate': 1.71695018451073e-05, 'epoch': 0.27} {'loss': 0.3041, 'learning_rate': 1.7165156778729355e-05, 'epoch': 0.27} {'loss': 0.8237, 'learning_rate': 1.7160808930590845e-05, 'epoch': 0.27} {'loss': 0.8936, 'learning_rate': 1.7156458302379753e-05, 'epoch': 0.27} {'loss': 0.2682, 'learning_rate': 1.7152104895785147e-05, 'epoch': 0.27} [2024-01-30 22:58:41,605] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8145, 'learning_rate': 1.7147748712497162e-05, 'epoch': 0.27} {'loss': 0.834, 'learning_rate': 1.7143389754207026e-05, 'epoch': 0.27} {'loss': 0.8057, 'learning_rate': 1.713902802260703e-05, 'epoch': 0.27} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1568590644.jpg' {'loss': 0.8198, 'learning_rate': 1.7134663519390557e-05, 'epoch': 0.27} {'loss': 0.832, 'learning_rate': 1.7130296246252048e-05, 'epoch': 0.27} {'loss': 0.7744, 'learning_rate': 1.7125926204887034e-05, 'epoch': 0.27} {'loss': 0.8267, 'learning_rate': 1.712155339699211e-05, 'epoch': 0.27} {'loss': 0.811, 'learning_rate': 1.7117177824264962e-05, 'epoch': 0.27} {'loss': 0.8022, 'learning_rate': 1.7112799488404327e-05, 'epoch': 0.27} {'loss': 0.8096, 'learning_rate': 1.7108418391110033e-05, 'epoch': 0.27} {'loss': 0.8203, 'learning_rate': 1.7104034534082968e-05, 'epoch': 0.27} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B00XLX3W9O.jpg' {'loss': 0.8291, 'learning_rate': 1.7099647919025096e-05, 'epoch': 0.27} [2024-01-30 23:02:38,575] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8154, 'learning_rate': 1.7095258547639456e-05, 'epoch': 0.27} [2024-01-30 23:02:55,849] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8413, 'learning_rate': 1.709086642163015e-05, 'epoch': 0.27} {'loss': 0.7935, 'learning_rate': 1.7086471542702355e-05, 'epoch': 0.27} {'loss': 0.8398, 'learning_rate': 1.708207391256231e-05, 'epoch': 0.27} {'loss': 0.2758, 'learning_rate': 1.707767353291733e-05, 'epoch': 0.27} {'loss': 0.8477, 'learning_rate': 1.7073270405475796e-05, 'epoch': 0.27} {'loss': 0.8276, 'learning_rate': 1.7068864531947147e-05, 'epoch': 0.27} {'loss': 0.7876, 'learning_rate': 1.70644559140419e-05, 'epoch': 0.27} {'loss': 0.8154, 'learning_rate': 1.706004455347163e-05, 'epoch': 0.27} {'loss': 0.8223, 'learning_rate': 1.705563045194898e-05, 'epoch': 0.27} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/394532643.jpg' {'loss': 0.8003, 'learning_rate': 1.7051213611187657e-05, 'epoch': 0.27} {'loss': 0.2471, 'learning_rate': 1.704679403290243e-05, 'epoch': 0.27} {'loss': 0.7993, 'learning_rate': 1.7042371718809132e-05, 'epoch': 0.27} {'loss': 0.8101, 'learning_rate': 1.7037946670624652e-05, 'epoch': 0.27} {'loss': 0.8066, 'learning_rate': 1.7033518890066956e-05, 'epoch': 0.27} {'loss': 0.2941, 'learning_rate': 1.7029088378855055e-05, 'epoch': 0.27} {'loss': 0.29, 'learning_rate': 1.7024655138709025e-05, 'epoch': 0.27} {'loss': 0.8335, 'learning_rate': 1.7020219171350004e-05, 'epoch': 0.27} {'loss': 0.8228, 'learning_rate': 1.7015780478500187e-05, 'epoch': 0.27} {'loss': 0.832, 'learning_rate': 1.701133906188283e-05, 'epoch': 0.28} {'loss': 0.8291, 'learning_rate': 1.700689492322224e-05, 'epoch': 0.28} {'loss': 0.8989, 'learning_rate': 1.700244806424379e-05, 'epoch': 0.28} {'loss': 0.8384, 'learning_rate': 1.6997998486673893e-05, 'epoch': 0.28} {'loss': 0.8228, 'learning_rate': 1.699354619224004e-05, 'epoch': 0.28} {'loss': 0.7886, 'learning_rate': 1.698909118267076e-05, 'epoch': 0.28} {'loss': 0.79, 'learning_rate': 1.6984633459695646e-05, 'epoch': 0.28} {'loss': 0.8325, 'learning_rate': 1.6980173025045328e-05, 'epoch': 0.28} {'loss': 0.9097, 'learning_rate': 1.697570988045151e-05, 'epoch': 0.28} {'loss': 0.2743, 'learning_rate': 1.6971244027646937e-05, 'epoch': 0.28} {'loss': 0.8569, 'learning_rate': 1.69667754683654e-05, 'epoch': 0.28} {'loss': 0.2904, 'learning_rate': 1.6962304204341758e-05, 'epoch': 0.28} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/471024961.jpg' {'loss': 0.8379, 'learning_rate': 1.6957830237311904e-05, 'epoch': 0.28} {'loss': 0.8823, 'learning_rate': 1.6953353569012784e-05, 'epoch': 0.28} {'loss': 0.8008, 'learning_rate': 1.6948874201182402e-05, 'epoch': 0.28} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739704516.jpg' {'loss': 0.8418, 'learning_rate': 1.6944392135559798e-05, 'epoch': 0.28} {'loss': 0.9053, 'learning_rate': 1.6939907373885062e-05, 'epoch': 0.28} {'loss': 0.8359, 'learning_rate': 1.6935419917899335e-05, 'epoch': 0.28} {'loss': 0.8594, 'learning_rate': 1.6930929769344807e-05, 'epoch': 0.28} {'loss': 0.854, 'learning_rate': 1.69264369299647e-05, 'epoch': 0.28} {'loss': 0.8438, 'learning_rate': 1.692194140150329e-05, 'epoch': 0.28} {'loss': 0.8516, 'learning_rate': 1.69174431857059e-05, 'epoch': 0.28} {'loss': 0.8579, 'learning_rate': 1.6912942284318898e-05, 'epoch': 0.28} {'loss': 0.833, 'learning_rate': 1.6908438699089674e-05, 'epoch': 0.28} {'loss': 0.853, 'learning_rate': 1.690393243176668e-05, 'epoch': 0.28} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/830816011.jpg' {'loss': 0.7808, 'learning_rate': 1.6899423484099413e-05, 'epoch': 0.28} {'loss': 0.8872, 'learning_rate': 1.6894911857838394e-05, 'epoch': 0.28} {'loss': 0.8521, 'learning_rate': 1.689039755473519e-05, 'epoch': 0.28} {'loss': 0.8032, 'learning_rate': 1.6885880576542417e-05, 'epoch': 0.28} {'loss': 0.7441, 'learning_rate': 1.6881360925013712e-05, 'epoch': 0.28} {'loss': 0.8267, 'learning_rate': 1.6876838601903765e-05, 'epoch': 0.28} {'loss': 0.8018, 'learning_rate': 1.6872313608968296e-05, 'epoch': 0.28} {'loss': 0.7622, 'learning_rate': 1.6867785947964065e-05, 'epoch': 0.28} {'loss': 0.8423, 'learning_rate': 1.6863255620648866e-05, 'epoch': 0.28} {'loss': 0.8125, 'learning_rate': 1.685872262878152e-05, 'epoch': 0.28} {'loss': 0.8242, 'learning_rate': 1.6854186974121903e-05, 'epoch': 0.28} {'loss': 0.9004, 'learning_rate': 1.68496486584309e-05, 'epoch': 0.28} {'loss': 0.8569, 'learning_rate': 1.6845107683470453e-05, 'epoch': 0.28} {'loss': 0.278, 'learning_rate': 1.6840564051003517e-05, 'epoch': 0.28} {'loss': 0.8174, 'learning_rate': 1.6836017762794087e-05, 'epoch': 0.28} {'loss': 0.2828, 'learning_rate': 1.6831468820607192e-05, 'epoch': 0.28} {'loss': 0.7881, 'learning_rate': 1.6826917226208886e-05, 'epoch': 0.28} {'loss': 0.9033, 'learning_rate': 1.6822362981366257e-05, 'epoch': 0.28} {'loss': 0.7866, 'learning_rate': 1.6817806087847417e-05, 'epoch': 0.28} [2024-01-30 23:22:02,125] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7856, 'learning_rate': 1.681324654742151e-05, 'epoch': 0.28} {'loss': 0.7896, 'learning_rate': 1.6808684361858706e-05, 'epoch': 0.28} [2024-01-30 23:22:40,782] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8433, 'learning_rate': 1.6804119532930202e-05, 'epoch': 0.28} {'loss': 0.8252, 'learning_rate': 1.6799552062408225e-05, 'epoch': 0.28} [2024-01-30 23:23:15,143] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8491, 'learning_rate': 1.6794981952066018e-05, 'epoch': 0.28} {'loss': 0.8364, 'learning_rate': 1.6790409203677863e-05, 'epoch': 0.28} {'loss': 0.2865, 'learning_rate': 1.6785833819019052e-05, 'epoch': 0.28} {'loss': 0.8276, 'learning_rate': 1.678125579986591e-05, 'epoch': 0.28} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/750219378.jpg' {'loss': 0.8198, 'learning_rate': 1.677667514799578e-05, 'epoch': 0.29} {'loss': 0.8184, 'learning_rate': 1.6772091865187032e-05, 'epoch': 0.29} {'loss': 0.8423, 'learning_rate': 1.676750595321905e-05, 'epoch': 0.29} {'loss': 0.8398, 'learning_rate': 1.6762917413872246e-05, 'epoch': 0.29} {'loss': 0.7925, 'learning_rate': 1.675832624892805e-05, 'epoch': 0.29} {'loss': 0.8203, 'learning_rate': 1.6753732460168907e-05, 'epoch': 0.29} {'loss': 0.7891, 'learning_rate': 1.674913604937828e-05, 'epoch': 0.29} {'loss': 0.8008, 'learning_rate': 1.6744537018340662e-05, 'epoch': 0.29} {'loss': 0.8628, 'learning_rate': 1.6739935368841555e-05, 'epoch': 0.29} {'loss': 0.8066, 'learning_rate': 1.6735331102667475e-05, 'epoch': 0.29} {'loss': 0.8433, 'learning_rate': 1.6730724221605955e-05, 'epoch': 0.29} {'loss': 0.8701, 'learning_rate': 1.6726114727445547e-05, 'epoch': 0.29} {'loss': 0.8384, 'learning_rate': 1.6721502621975813e-05, 'epoch': 0.29} [2024-01-30 23:28:23,141] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7812, 'learning_rate': 1.6716887906987332e-05, 'epoch': 0.29} {'loss': 0.8608, 'learning_rate': 1.6712270584271703e-05, 'epoch': 0.29} {'loss': 0.8198, 'learning_rate': 1.670765065562152e-05, 'epoch': 0.29} {'loss': 0.8159, 'learning_rate': 1.67030281228304e-05, 'epoch': 0.29} {'loss': 0.2904, 'learning_rate': 1.6698402987692968e-05, 'epoch': 0.29} {'loss': 0.7856, 'learning_rate': 1.6693775252004866e-05, 'epoch': 0.29} {'loss': 0.8379, 'learning_rate': 1.668914491756274e-05, 'epoch': 0.29} [2024-01-30 23:30:38,010] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8164, 'learning_rate': 1.668451198616424e-05, 'epoch': 0.29} [2024-01-30 23:30:57,254] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8057, 'learning_rate': 1.6679876459608033e-05, 'epoch': 0.29} [2024-01-30 23:31:15,940] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.813, 'learning_rate': 1.667523833969379e-05, 'epoch': 0.29} {'loss': 0.8354, 'learning_rate': 1.667059762822219e-05, 'epoch': 0.29} [2024-01-30 23:31:56,329] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8408, 'learning_rate': 1.666595432699491e-05, 'epoch': 0.29} {'loss': 0.7432, 'learning_rate': 1.6661308437814652e-05, 'epoch': 0.29} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/292796048.jpg' [2024-01-30 23:32:31,379] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8647, 'learning_rate': 1.6656659962485097e-05, 'epoch': 0.29} {'loss': 0.8101, 'learning_rate': 1.6652008902810952e-05, 'epoch': 0.29} {'loss': 0.7358, 'learning_rate': 1.6647355260597915e-05, 'epoch': 0.29} {'loss': 0.8281, 'learning_rate': 1.664269903765269e-05, 'epoch': 0.29} {'loss': 0.8037, 'learning_rate': 1.6638040235782983e-05, 'epoch': 0.29} [2024-01-30 23:34:03,687] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7871, 'learning_rate': 1.6633378856797505e-05, 'epoch': 0.29} {'loss': 0.8364, 'learning_rate': 1.662871490250596e-05, 'epoch': 0.29} {'loss': 0.8569, 'learning_rate': 1.662404837471905e-05, 'epoch': 0.29} {'loss': 0.8647, 'learning_rate': 1.66193792752485e-05, 'epoch': 0.29} {'loss': 0.8491, 'learning_rate': 1.6614707605906995e-05, 'epoch': 0.29} {'loss': 0.8735, 'learning_rate': 1.661003336850825e-05, 'epoch': 0.29} {'loss': 0.855, 'learning_rate': 1.660535656486696e-05, 'epoch': 0.29} {'loss': 0.8135, 'learning_rate': 1.660067719679882e-05, 'epoch': 0.29} {'loss': 0.8745, 'learning_rate': 1.6595995266120528e-05, 'epoch': 0.29} {'loss': 0.8569, 'learning_rate': 1.6591310774649766e-05, 'epoch': 0.29} {'loss': 0.8491, 'learning_rate': 1.6586623724205216e-05, 'epoch': 0.29} {'loss': 0.8491, 'learning_rate': 1.6581934116606554e-05, 'epoch': 0.29} {'loss': 0.8159, 'learning_rate': 1.657724195367444e-05, 'epoch': 0.29} WARNING: tokenization mismatch: 1 vs. 1440. (ignored) {'loss': 0.7866, 'learning_rate': 1.657254723723054e-05, 'epoch': 0.29} {'loss': 0.8242, 'learning_rate': 1.6567849969097505e-05, 'epoch': 0.29} {'loss': 0.8301, 'learning_rate': 1.6563150151098973e-05, 'epoch': 0.29} {'loss': 0.8652, 'learning_rate': 1.6558447785059577e-05, 'epoch': 0.29} {'loss': 0.771, 'learning_rate': 1.655374287280494e-05, 'epoch': 0.29} [2024-01-30 23:39:24,393] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8252, 'learning_rate': 1.6549035416161662e-05, 'epoch': 0.29} [2024-01-30 23:39:42,717] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7817, 'learning_rate': 1.654432541695735e-05, 'epoch': 0.29} {'loss': 0.8511, 'learning_rate': 1.653961287702058e-05, 'epoch': 0.29} {'loss': 0.8408, 'learning_rate': 1.653489779818093e-05, 'epoch': 0.3} {'loss': 0.2574, 'learning_rate': 1.6530180182268946e-05, 'epoch': 0.3} {'loss': 0.8589, 'learning_rate': 1.652546003111618e-05, 'epoch': 0.3} {'loss': 0.8574, 'learning_rate': 1.652073734655515e-05, 'epoch': 0.3} {'loss': 0.8203, 'learning_rate': 1.6516012130419366e-05, 'epoch': 0.3} {'loss': 0.8174, 'learning_rate': 1.6511284384543317e-05, 'epoch': 0.3} {'loss': 0.7788, 'learning_rate': 1.6506554110762483e-05, 'epoch': 0.3} {'loss': 0.2889, 'learning_rate': 1.650182131091332e-05, 'epoch': 0.3} {'loss': 0.8223, 'learning_rate': 1.6497085986833252e-05, 'epoch': 0.3} {'loss': 0.8496, 'learning_rate': 1.6492348140360704e-05, 'epoch': 0.3} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/465069347.jpg' {'loss': 0.2604, 'learning_rate': 1.6487607773335074e-05, 'epoch': 0.3} {'loss': 0.8013, 'learning_rate': 1.648286488759673e-05, 'epoch': 0.3} {'loss': 0.8677, 'learning_rate': 1.6478119484987026e-05, 'epoch': 0.3} {'loss': 0.8516, 'learning_rate': 1.6473371567348287e-05, 'epoch': 0.3} {'loss': 0.7925, 'learning_rate': 1.6468621136523823e-05, 'epoch': 0.3} [2024-01-30 23:44:55,517] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8613, 'learning_rate': 1.646386819435791e-05, 'epoch': 0.3} {'loss': 0.8027, 'learning_rate': 1.6459112742695807e-05, 'epoch': 0.3} {'loss': 0.7764, 'learning_rate': 1.6454354783383748e-05, 'epoch': 0.3} {'loss': 0.8555, 'learning_rate': 1.644959431826893e-05, 'epoch': 0.3} {'loss': 0.7808, 'learning_rate': 1.6444831349199528e-05, 'epoch': 0.3} {'loss': 0.8223, 'learning_rate': 1.6440065878024697e-05, 'epoch': 0.3} {'loss': 0.8496, 'learning_rate': 1.6435297906594553e-05, 'epoch': 0.3} [2024-01-30 23:47:10,002] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8633, 'learning_rate': 1.643052743676019e-05, 'epoch': 0.3} [2024-01-30 23:47:36,118] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8164, 'learning_rate': 1.6425754470373667e-05, 'epoch': 0.3} {'loss': 0.8218, 'learning_rate': 1.642097900928801e-05, 'epoch': 0.3} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/963479903.jpg' {'loss': 0.7754, 'learning_rate': 1.6416201055357225e-05, 'epoch': 0.3} {'loss': 0.8398, 'learning_rate': 1.641142061043627e-05, 'epoch': 0.3} {'loss': 0.8398, 'learning_rate': 1.640663767638108e-05, 'epoch': 0.3} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/155850835X.jpg' {'loss': 0.8408, 'learning_rate': 1.6401852255048564e-05, 'epoch': 0.3} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B005FOFNA8.jpg' {'loss': 0.8481, 'learning_rate': 1.6397064348296578e-05, 'epoch': 0.3} {'loss': 0.8418, 'learning_rate': 1.6392273957983955e-05, 'epoch': 0.3} {'loss': 0.8535, 'learning_rate': 1.638748108597049e-05, 'epoch': 0.3} {'loss': 0.8682, 'learning_rate': 1.6382685734116934e-05, 'epoch': 0.3} {'loss': 0.8628, 'learning_rate': 1.6377887904285018e-05, 'epoch': 0.3} {'loss': 0.8525, 'learning_rate': 1.637308759833742e-05, 'epoch': 0.3} {'loss': 0.7881, 'learning_rate': 1.6368284818137787e-05, 'epoch': 0.3} {'loss': 0.8804, 'learning_rate': 1.636347956555072e-05, 'epoch': 0.3} {'loss': 0.7405, 'learning_rate': 1.635867184244178e-05, 'epoch': 0.3} {'loss': 0.8599, 'learning_rate': 1.63538616506775e-05, 'epoch': 0.3} {'loss': 0.8569, 'learning_rate': 1.6349048992125358e-05, 'epoch': 0.3} [2024-01-30 23:53:05,799] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7896, 'learning_rate': 1.634423386865379e-05, 'epoch': 0.3} [2024-01-30 23:53:26,979] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7935, 'learning_rate': 1.6339416282132196e-05, 'epoch': 0.3} [2024-01-30 23:53:43,731] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8018, 'learning_rate': 1.633459623443093e-05, 'epoch': 0.3} {'loss': 0.8511, 'learning_rate': 1.6329773727421297e-05, 'epoch': 0.3} {'loss': 0.8413, 'learning_rate': 1.6324948762975567e-05, 'epoch': 0.3} {'loss': 0.7959, 'learning_rate': 1.632012134296695e-05, 'epoch': 0.3} {'loss': 0.8345, 'learning_rate': 1.6315291469269617e-05, 'epoch': 0.3} {'loss': 0.8799, 'learning_rate': 1.63104591437587e-05, 'epoch': 0.3} {'loss': 0.8345, 'learning_rate': 1.6305624368310265e-05, 'epoch': 0.3} {'loss': 0.8672, 'learning_rate': 1.630078714480134e-05, 'epoch': 0.3} {'loss': 0.8071, 'learning_rate': 1.6295947475109904e-05, 'epoch': 0.3} {'loss': 0.2882, 'learning_rate': 1.629110536111488e-05, 'epoch': 0.3} {'loss': 0.8809, 'learning_rate': 1.628626080469615e-05, 'epoch': 0.31} {'loss': 0.8281, 'learning_rate': 1.628141380773453e-05, 'epoch': 0.31} {'loss': 0.2729, 'learning_rate': 1.6276564372111797e-05, 'epoch': 0.31} {'loss': 0.8628, 'learning_rate': 1.6271712499710663e-05, 'epoch': 0.31} {'loss': 0.2815, 'learning_rate': 1.62668581924148e-05, 'epoch': 0.31} {'loss': 0.7881, 'learning_rate': 1.6262001452108807e-05, 'epoch': 0.31} {'loss': 0.834, 'learning_rate': 1.6257142280678247e-05, 'epoch': 0.31} {'loss': 0.8555, 'learning_rate': 1.6252280680009613e-05, 'epoch': 0.31} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/807085707.jpg' {'loss': 0.814, 'learning_rate': 1.6247416651990343e-05, 'epoch': 0.31} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1890916196.jpg' {'loss': 0.8354, 'learning_rate': 1.624255019850883e-05, 'epoch': 0.31} {'loss': 0.812, 'learning_rate': 1.6237681321454387e-05, 'epoch': 0.31} {'loss': 0.937, 'learning_rate': 1.623281002271729e-05, 'epoch': 0.31} [2024-01-31 00:00:19,599] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8379, 'learning_rate': 1.6227936304188738e-05, 'epoch': 0.31} [2024-01-31 00:00:38,503] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2561, 'learning_rate': 1.622306016776088e-05, 'epoch': 0.31} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1560449152.jpg' {'loss': 0.8457, 'learning_rate': 1.6218181615326795e-05, 'epoch': 0.31} {'loss': 0.8472, 'learning_rate': 1.6213300648780515e-05, 'epoch': 0.31} {'loss': 0.8037, 'learning_rate': 1.620841727001699e-05, 'epoch': 0.31} {'loss': 0.8315, 'learning_rate': 1.6203531480932114e-05, 'epoch': 0.31} {'loss': 0.7778, 'learning_rate': 1.619864328342273e-05, 'epoch': 0.31} {'loss': 0.7944, 'learning_rate': 1.6193752679386593e-05, 'epoch': 0.31} {'loss': 0.7793, 'learning_rate': 1.6188859670722414e-05, 'epoch': 0.31} {'loss': 0.7593, 'learning_rate': 1.6183964259329817e-05, 'epoch': 0.31} {'loss': 0.8135, 'learning_rate': 1.6179066447109376e-05, 'epoch': 0.31} {'loss': 0.7881, 'learning_rate': 1.6174166235962588e-05, 'epoch': 0.31} {'loss': 0.792, 'learning_rate': 1.6169263627791886e-05, 'epoch': 0.31} [2024-01-31 00:04:29,314] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8491, 'learning_rate': 1.616435862450063e-05, 'epoch': 0.31} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1570280762.jpg' {'loss': 0.874, 'learning_rate': 1.615945122799311e-05, 'epoch': 0.31} [2024-01-31 00:05:08,405] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8447, 'learning_rate': 1.6154541440174547e-05, 'epoch': 0.31} [2024-01-31 00:05:27,576] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7993, 'learning_rate': 1.614962926295109e-05, 'epoch': 0.31} {'loss': 0.8501, 'learning_rate': 1.6144714698229814e-05, 'epoch': 0.31} {'loss': 0.8203, 'learning_rate': 1.6139797747918725e-05, 'epoch': 0.31} [2024-01-31 00:06:25,793] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.875, 'learning_rate': 1.613487841392675e-05, 'epoch': 0.31} {'loss': 0.854, 'learning_rate': 1.612995669816375e-05, 'epoch': 0.31} [2024-01-31 00:07:08,412] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.833, 'learning_rate': 1.6125032602540492e-05, 'epoch': 0.31} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739702505.jpg' [2024-01-31 00:07:26,203] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.812, 'learning_rate': 1.6120106128968686e-05, 'epoch': 0.31} [2024-01-31 00:07:50,185] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7461, 'learning_rate': 1.6115177279360965e-05, 'epoch': 0.31} [2024-01-31 00:08:10,514] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7666, 'learning_rate': 1.611024605563087e-05, 'epoch': 0.31} {'loss': 0.8086, 'learning_rate': 1.610531245969287e-05, 'epoch': 0.31} {'loss': 0.8467, 'learning_rate': 1.6100376493462368e-05, 'epoch': 0.31} {'loss': 0.8247, 'learning_rate': 1.6095438158855668e-05, 'epoch': 0.31} {'loss': 0.8652, 'learning_rate': 1.609049745779e-05, 'epoch': 0.31} {'loss': 0.8242, 'learning_rate': 1.6085554392183517e-05, 'epoch': 0.31} {'loss': 0.8218, 'learning_rate': 1.608060896395529e-05, 'epoch': 0.31} {'loss': 0.8071, 'learning_rate': 1.60756611750253e-05, 'epoch': 0.31} {'loss': 0.7974, 'learning_rate': 1.6070711027314446e-05, 'epoch': 0.31} {'loss': 0.7744, 'learning_rate': 1.606575852274456e-05, 'epoch': 0.31} {'loss': 0.8569, 'learning_rate': 1.6060803663238357e-05, 'epoch': 0.31} {'loss': 0.7979, 'learning_rate': 1.6055846450719498e-05, 'epoch': 0.31} {'loss': 0.8408, 'learning_rate': 1.6050886887112535e-05, 'epoch': 0.31} {'loss': 0.7944, 'learning_rate': 1.6045924974342945e-05, 'epoch': 0.31} {'loss': 0.9028, 'learning_rate': 1.604096071433711e-05, 'epoch': 0.31} {'loss': 0.8228, 'learning_rate': 1.6035994109022333e-05, 'epoch': 0.31} {'loss': 0.8618, 'learning_rate': 1.6031025160326814e-05, 'epoch': 0.32} {'loss': 0.8643, 'learning_rate': 1.6026053870179678e-05, 'epoch': 0.32} {'loss': 0.8779, 'learning_rate': 1.6021080240510943e-05, 'epoch': 0.32} {'loss': 0.811, 'learning_rate': 1.601610427325155e-05, 'epoch': 0.32} {'loss': 0.2625, 'learning_rate': 1.6011125970333333e-05, 'epoch': 0.32} [2024-01-31 00:14:32,020] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7734, 'learning_rate': 1.600614533368905e-05, 'epoch': 0.32} [2024-01-31 00:14:51,358] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7671, 'learning_rate': 1.6001162365252348e-05, 'epoch': 0.32} [2024-01-31 00:15:10,490] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.896, 'learning_rate': 1.5996177066957787e-05, 'epoch': 0.32} {'loss': 0.8193, 'learning_rate': 1.5991189440740838e-05, 'epoch': 0.32} [2024-01-31 00:15:49,794] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7998, 'learning_rate': 1.5986199488537867e-05, 'epoch': 0.32} [2024-01-31 00:16:07,816] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8159, 'learning_rate': 1.598120721228614e-05, 'epoch': 0.32} {'loss': 0.8613, 'learning_rate': 1.5976212613923836e-05, 'epoch': 0.32} {'loss': 0.8521, 'learning_rate': 1.5971215695390026e-05, 'epoch': 0.32} {'loss': 0.8003, 'learning_rate': 1.5966216458624692e-05, 'epoch': 0.32} {'loss': 0.8354, 'learning_rate': 1.5961214905568705e-05, 'epoch': 0.32} {'loss': 0.8486, 'learning_rate': 1.595621103816384e-05, 'epoch': 0.32} {'loss': 0.8198, 'learning_rate': 1.5951204858352772e-05, 'epoch': 0.32} {'loss': 0.812, 'learning_rate': 1.594619636807907e-05, 'epoch': 0.32} {'loss': 0.7827, 'learning_rate': 1.5941185569287206e-05, 'epoch': 0.32} {'loss': 0.8379, 'learning_rate': 1.5936172463922542e-05, 'epoch': 0.32} [2024-01-31 00:19:13,772] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8315, 'learning_rate': 1.593115705393134e-05, 'epoch': 0.32} [2024-01-31 00:19:33,398] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7766, 'learning_rate': 1.5926139341260755e-05, 'epoch': 0.32} [2024-01-31 00:19:52,290] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8657, 'learning_rate': 1.5921119327858835e-05, 'epoch': 0.32} {'loss': 0.8164, 'learning_rate': 1.5916097015674518e-05, 'epoch': 0.32} {'loss': 0.8438, 'learning_rate': 1.5911072406657646e-05, 'epoch': 0.32} {'loss': 0.7842, 'learning_rate': 1.5906045502758943e-05, 'epoch': 0.32} {'loss': 0.771, 'learning_rate': 1.590101630593002e-05, 'epoch': 0.32} {'loss': 0.8477, 'learning_rate': 1.5895984818123392e-05, 'epoch': 0.32} {'loss': 0.8062, 'learning_rate': 1.5890951041292453e-05, 'epoch': 0.32} {'loss': 0.2939, 'learning_rate': 1.588591497739149e-05, 'epoch': 0.32} {'loss': 0.8218, 'learning_rate': 1.5880876628375668e-05, 'epoch': 0.32} {'loss': 0.8091, 'learning_rate': 1.587583599620106e-05, 'epoch': 0.32} {'loss': 0.8125, 'learning_rate': 1.5870793082824604e-05, 'epoch': 0.32} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/785282394.jpg' {'loss': 0.8652, 'learning_rate': 1.5865747890204138e-05, 'epoch': 0.32} {'loss': 0.7783, 'learning_rate': 1.5860700420298377e-05, 'epoch': 0.32} {'loss': 0.813, 'learning_rate': 1.5855650675066924e-05, 'epoch': 0.32} {'loss': 0.8442, 'learning_rate': 1.5850598656470265e-05, 'epoch': 0.32} {'loss': 0.8418, 'learning_rate': 1.584554436646976e-05, 'epoch': 0.32} {'loss': 0.8018, 'learning_rate': 1.5840487807027665e-05, 'epoch': 0.32} {'loss': 0.8677, 'learning_rate': 1.5835428980107113e-05, 'epoch': 0.32} {'loss': 0.8296, 'learning_rate': 1.583036788767211e-05, 'epoch': 0.32} {'loss': 0.8784, 'learning_rate': 1.5825304531687548e-05, 'epoch': 0.32} {'loss': 0.8345, 'learning_rate': 1.5820238914119195e-05, 'epoch': 0.32} {'loss': 0.8242, 'learning_rate': 1.5815171036933697e-05, 'epoch': 0.32} {'loss': 0.8081, 'learning_rate': 1.5810100902098582e-05, 'epoch': 0.32} {'loss': 0.2695, 'learning_rate': 1.580502851158225e-05, 'epoch': 0.32} {'loss': 0.8325, 'learning_rate': 1.5799953867353975e-05, 'epoch': 0.32} {'loss': 0.7832, 'learning_rate': 1.579487697138391e-05, 'epoch': 0.32} {'loss': 0.875, 'learning_rate': 1.5789797825643086e-05, 'epoch': 0.32} {'loss': 0.814, 'learning_rate': 1.5784716432103394e-05, 'epoch': 0.32} {'loss': 0.8057, 'learning_rate': 1.5779632792737608e-05, 'epoch': 0.32} {'loss': 0.7905, 'learning_rate': 1.5774546909519376e-05, 'epoch': 0.32} {'loss': 0.2917, 'learning_rate': 1.5769458784423206e-05, 'epoch': 0.33} {'loss': 0.8789, 'learning_rate': 1.5764368419424488e-05, 'epoch': 0.33} {'loss': 0.8403, 'learning_rate': 1.575927581649948e-05, 'epoch': 0.33} {'loss': 0.8408, 'learning_rate': 1.5754180977625303e-05, 'epoch': 0.33} WARNING: tokenization mismatch: 1 vs. 1590. (ignored) {'loss': 0.832, 'learning_rate': 1.574908390477995e-05, 'epoch': 0.33} {'loss': 0.7954, 'learning_rate': 1.5743984599942273e-05, 'epoch': 0.33} {'loss': 0.8228, 'learning_rate': 1.5738883065092005e-05, 'epoch': 0.33} {'loss': 0.8574, 'learning_rate': 1.5733779302209735e-05, 'epoch': 0.33} {'loss': 0.8354, 'learning_rate': 1.572867331327692e-05, 'epoch': 0.33} {'loss': 0.7495, 'learning_rate': 1.5723565100275884e-05, 'epoch': 0.33} {'loss': 0.894, 'learning_rate': 1.5718454665189806e-05, 'epoch': 0.33} {'loss': 0.8457, 'learning_rate': 1.5713342010002733e-05, 'epoch': 0.33} {'loss': 0.7905, 'learning_rate': 1.5708227136699578e-05, 'epoch': 0.33} {'loss': 0.2606, 'learning_rate': 1.5703110047266105e-05, 'epoch': 0.33} {'loss': 0.2821, 'learning_rate': 1.569799074368895e-05, 'epoch': 0.33} {'loss': 0.7979, 'learning_rate': 1.5692869227955603e-05, 'epoch': 0.33} {'loss': 0.8496, 'learning_rate': 1.5687745502054407e-05, 'epoch': 0.33} {'loss': 0.7905, 'learning_rate': 1.5682619567974575e-05, 'epoch': 0.33} {'loss': 0.8076, 'learning_rate': 1.567749142770617e-05, 'epoch': 0.33} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/893463264.jpg' {'loss': 0.8164, 'learning_rate': 1.5672361083240106e-05, 'epoch': 0.33} {'loss': 0.7656, 'learning_rate': 1.5667228536568167e-05, 'epoch': 0.33} {'loss': 0.7715, 'learning_rate': 1.566209378968298e-05, 'epoch': 0.33} {'loss': 0.79, 'learning_rate': 1.565695684457803e-05, 'epoch': 0.33} {'loss': 0.7666, 'learning_rate': 1.5651817703247666e-05, 'epoch': 0.33} {'loss': 0.8291, 'learning_rate': 1.5646676367687067e-05, 'epoch': 0.33} {'loss': 0.8384, 'learning_rate': 1.564153283989228e-05, 'epoch': 0.33} {'loss': 0.8311, 'learning_rate': 1.5636387121860207e-05, 'epoch': 0.33} {'loss': 0.8306, 'learning_rate': 1.5631239215588578e-05, 'epoch': 0.33} {'loss': 0.8125, 'learning_rate': 1.5626089123076004e-05, 'epoch': 0.33} {'loss': 0.855, 'learning_rate': 1.5620936846321917e-05, 'epoch': 0.33} {'loss': 0.8213, 'learning_rate': 1.561578238732661e-05, 'epoch': 0.33} {'loss': 0.8379, 'learning_rate': 1.561062574809123e-05, 'epoch': 0.33} {'loss': 0.8325, 'learning_rate': 1.5605466930617747e-05, 'epoch': 0.33} {'loss': 0.8403, 'learning_rate': 1.5600305936909005e-05, 'epoch': 0.33} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/800756614.jpg' {'loss': 0.8154, 'learning_rate': 1.559514276896867e-05, 'epoch': 0.33} {'loss': 0.8066, 'learning_rate': 1.558997742880127e-05, 'epoch': 0.33} {'loss': 0.7876, 'learning_rate': 1.5584809918412158e-05, 'epoch': 0.33} {'loss': 0.8643, 'learning_rate': 1.557964023980755e-05, 'epoch': 0.33} {'loss': 0.812, 'learning_rate': 1.5574468394994486e-05, 'epoch': 0.33} {'loss': 0.8115, 'learning_rate': 1.5569294385980856e-05, 'epoch': 0.33} {'loss': 0.8506, 'learning_rate': 1.556411821477539e-05, 'epoch': 0.33} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/425099369.jpg' {'loss': 0.7964, 'learning_rate': 1.5558939883387657e-05, 'epoch': 0.33} {'loss': 0.8643, 'learning_rate': 1.5553759393828058e-05, 'epoch': 0.33} {'loss': 0.7861, 'learning_rate': 1.554857674810784e-05, 'epoch': 0.33} {'loss': 0.812, 'learning_rate': 1.554339194823909e-05, 'epoch': 0.33} {'loss': 0.7822, 'learning_rate': 1.553820499623472e-05, 'epoch': 0.33} {'loss': 0.7524, 'learning_rate': 1.553301589410848e-05, 'epoch': 0.33} {'loss': 0.8188, 'learning_rate': 1.5527824643874968e-05, 'epoch': 0.33} {'loss': 0.814, 'learning_rate': 1.5522631247549598e-05, 'epoch': 0.33} {'loss': 0.7627, 'learning_rate': 1.5517435707148628e-05, 'epoch': 0.33} {'loss': 0.8374, 'learning_rate': 1.5512238024689144e-05, 'epoch': 0.33} {'loss': 0.856, 'learning_rate': 1.550703820218907e-05, 'epoch': 0.33} {'loss': 0.8013, 'learning_rate': 1.550183624166715e-05, 'epoch': 0.34} {'loss': 0.8291, 'learning_rate': 1.549663214514297e-05, 'epoch': 0.34} {'loss': 0.8311, 'learning_rate': 1.5491425914636934e-05, 'epoch': 0.34} {'loss': 0.8564, 'learning_rate': 1.5486217552170283e-05, 'epoch': 0.34} {'loss': 0.8457, 'learning_rate': 1.548100705976508e-05, 'epoch': 0.34} {'loss': 0.8149, 'learning_rate': 1.5475794439444226e-05, 'epoch': 0.34} {'loss': 0.8071, 'learning_rate': 1.5470579693231432e-05, 'epoch': 0.34} {'loss': 0.7683, 'learning_rate': 1.5465362823151245e-05, 'epoch': 0.34} {'loss': 0.7988, 'learning_rate': 1.5460143831229026e-05, 'epoch': 0.34} {'loss': 0.8262, 'learning_rate': 1.545492271949098e-05, 'epoch': 0.34} {'loss': 0.7881, 'learning_rate': 1.544969948996411e-05, 'epoch': 0.34} {'loss': 0.8521, 'learning_rate': 1.544447414467626e-05, 'epoch': 0.34} {'loss': 0.8027, 'learning_rate': 1.5439246685656093e-05, 'epoch': 0.34} {'loss': 0.2865, 'learning_rate': 1.5434017114933082e-05, 'epoch': 0.34} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1890838004.jpg' {'loss': 0.2699, 'learning_rate': 1.5428785434537527e-05, 'epoch': 0.34} {'loss': 0.8613, 'learning_rate': 1.542355164650055e-05, 'epoch': 0.34} {'loss': 0.8174, 'learning_rate': 1.541831575285408e-05, 'epoch': 0.34} {'loss': 0.8359, 'learning_rate': 1.541307775563088e-05, 'epoch': 0.34} {'loss': 0.2687, 'learning_rate': 1.540783765686452e-05, 'epoch': 0.34} [2024-01-31 00:50:33,901] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8184, 'learning_rate': 1.540259545858938e-05, 'epoch': 0.34} [2024-01-31 00:50:53,690] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7861, 'learning_rate': 1.539735116284067e-05, 'epoch': 0.34} {'loss': 0.8164, 'learning_rate': 1.53921047716544e-05, 'epoch': 0.34} {'loss': 0.8066, 'learning_rate': 1.53868562870674e-05, 'epoch': 0.34} {'loss': 0.8408, 'learning_rate': 1.5381605711117318e-05, 'epoch': 0.34} {'loss': 0.8281, 'learning_rate': 1.5376353045842604e-05, 'epoch': 0.34} {'loss': 0.7969, 'learning_rate': 1.5371098293282526e-05, 'epoch': 0.34} {'loss': 0.8604, 'learning_rate': 1.5365841455477158e-05, 'epoch': 0.34} {'loss': 0.8291, 'learning_rate': 1.5360582534467382e-05, 'epoch': 0.34} {'loss': 0.8135, 'learning_rate': 1.5355321532294897e-05, 'epoch': 0.34} {'loss': 0.8389, 'learning_rate': 1.5350058451002204e-05, 'epoch': 0.34} {'loss': 0.8384, 'learning_rate': 1.5344793292632614e-05, 'epoch': 0.34} {'loss': 0.8174, 'learning_rate': 1.533952605923024e-05, 'epoch': 0.34} {'loss': 0.8091, 'learning_rate': 1.5334256752840007e-05, 'epoch': 0.34} {'loss': 0.8428, 'learning_rate': 1.532898537550764e-05, 'epoch': 0.34} {'loss': 0.7925, 'learning_rate': 1.532371192927966e-05, 'epoch': 0.34} {'loss': 0.8486, 'learning_rate': 1.5318436416203412e-05, 'epoch': 0.34} {'loss': 0.7939, 'learning_rate': 1.531315883832703e-05, 'epoch': 0.34} {'loss': 0.8013, 'learning_rate': 1.530787919769945e-05, 'epoch': 0.34} {'loss': 0.8057, 'learning_rate': 1.5302597496370408e-05, 'epoch': 0.34} {'loss': 0.8018, 'learning_rate': 1.5297313736390447e-05, 'epoch': 0.34} {'loss': 0.299, 'learning_rate': 1.5292027919810898e-05, 'epoch': 0.34} {'loss': 0.8384, 'learning_rate': 1.52867400486839e-05, 'epoch': 0.34} {'loss': 0.7793, 'learning_rate': 1.528145012506239e-05, 'epoch': 0.34} {'loss': 0.791, 'learning_rate': 1.5276158151000096e-05, 'epoch': 0.34} {'loss': 0.8477, 'learning_rate': 1.5270864128551542e-05, 'epoch': 0.34} {'loss': 0.7671, 'learning_rate': 1.5265568059772053e-05, 'epoch': 0.34} {'loss': 0.8491, 'learning_rate': 1.5260269946717746e-05, 'epoch': 0.34} {'loss': 0.8286, 'learning_rate': 1.5254969791445526e-05, 'epoch': 0.34} {'loss': 0.8066, 'learning_rate': 1.5249667596013102e-05, 'epoch': 0.34} {'loss': 0.8062, 'learning_rate': 1.5244363362478967e-05, 'epoch': 0.34} {'loss': 0.8379, 'learning_rate': 1.5239057092902404e-05, 'epoch': 0.34} {'loss': 0.7832, 'learning_rate': 1.523374878934349e-05, 'epoch': 0.34} [2024-01-31 01:00:47,584] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8789, 'learning_rate': 1.5228438453863095e-05, 'epoch': 0.35} [2024-01-31 01:01:10,141] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7876, 'learning_rate': 1.522312608852287e-05, 'epoch': 0.35} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/898620996.jpg' {'loss': 0.8369, 'learning_rate': 1.5217811695385263e-05, 'epoch': 0.35} {'loss': 0.7734, 'learning_rate': 1.52124952765135e-05, 'epoch': 0.35} {'loss': 0.8706, 'learning_rate': 1.5207176833971598e-05, 'epoch': 0.35} {'loss': 0.8472, 'learning_rate': 1.520185636982436e-05, 'epoch': 0.35} {'loss': 0.7988, 'learning_rate': 1.5196533886137376e-05, 'epoch': 0.35} {'loss': 0.8013, 'learning_rate': 1.5191209384977014e-05, 'epoch': 0.35} {'loss': 0.8413, 'learning_rate': 1.5185882868410431e-05, 'epoch': 0.35} {'loss': 0.7505, 'learning_rate': 1.5180554338505564e-05, 'epoch': 0.35} {'loss': 0.8608, 'learning_rate': 1.517522379733113e-05, 'epoch': 0.35} {'loss': 0.73, 'learning_rate': 1.5169891246956629e-05, 'epoch': 0.35} {'loss': 0.7993, 'learning_rate': 1.5164556689452346e-05, 'epoch': 0.35} {'loss': 0.8076, 'learning_rate': 1.5159220126889329e-05, 'epoch': 0.35} {'loss': 0.8335, 'learning_rate': 1.5153881561339426e-05, 'epoch': 0.35} {'loss': 0.8423, 'learning_rate': 1.5148540994875242e-05, 'epoch': 0.35} {'loss': 0.8198, 'learning_rate': 1.5143198429570181e-05, 'epoch': 0.35} {'loss': 0.2799, 'learning_rate': 1.5137853867498403e-05, 'epoch': 0.35} {'loss': 0.8145, 'learning_rate': 1.5132507310734847e-05, 'epoch': 0.35} {'loss': 0.2935, 'learning_rate': 1.5127158761355241e-05, 'epoch': 0.35} {'loss': 0.8125, 'learning_rate': 1.512180822143607e-05, 'epoch': 0.35} {'loss': 0.7905, 'learning_rate': 1.5116455693054594e-05, 'epoch': 0.35} {'loss': 0.8091, 'learning_rate': 1.5111101178288858e-05, 'epoch': 0.35} {'loss': 0.8345, 'learning_rate': 1.510574467921766e-05, 'epoch': 0.35} {'loss': 0.7739, 'learning_rate': 1.5100386197920585e-05, 'epoch': 0.35} {'loss': 0.8696, 'learning_rate': 1.5095025736477977e-05, 'epoch': 0.35} {'loss': 0.729, 'learning_rate': 1.5089663296970952e-05, 'epoch': 0.35} {'loss': 0.8403, 'learning_rate': 1.5084298881481388e-05, 'epoch': 0.35} {'loss': 0.2571, 'learning_rate': 1.5078932492091942e-05, 'epoch': 0.35} {'loss': 0.7583, 'learning_rate': 1.5073564130886032e-05, 'epoch': 0.35} {'loss': 0.7749, 'learning_rate': 1.506819379994784e-05, 'epoch': 0.35} {'loss': 0.8037, 'learning_rate': 1.5062821501362308e-05, 'epoch': 0.35} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/185230863X.jpg' {'loss': 0.7939, 'learning_rate': 1.5057447237215152e-05, 'epoch': 0.35} {'loss': 0.8447, 'learning_rate': 1.5052071009592846e-05, 'epoch': 0.35} {'loss': 0.2614, 'learning_rate': 1.5046692820582625e-05, 'epoch': 0.35} {'loss': 0.8442, 'learning_rate': 1.504131267227249e-05, 'epoch': 0.35} {'loss': 0.8721, 'learning_rate': 1.5035930566751198e-05, 'epoch': 0.35} {'loss': 0.7856, 'learning_rate': 1.5030546506108268e-05, 'epoch': 0.35} {'loss': 0.8013, 'learning_rate': 1.5025160492433976e-05, 'epoch': 0.35} {'loss': 0.7949, 'learning_rate': 1.501977252781936e-05, 'epoch': 0.35} {'loss': 0.8022, 'learning_rate': 1.5014382614356213e-05, 'epoch': 0.35} {'loss': 0.8618, 'learning_rate': 1.5008990754137088e-05, 'epoch': 0.35} {'loss': 0.7822, 'learning_rate': 1.5003596949255284e-05, 'epoch': 0.35} {'loss': 0.8115, 'learning_rate': 1.4998201201804867e-05, 'epoch': 0.35} {'loss': 0.2701, 'learning_rate': 1.499280351388065e-05, 'epoch': 0.35} {'loss': 0.8521, 'learning_rate': 1.49874038875782e-05, 'epoch': 0.35} {'loss': 0.7896, 'learning_rate': 1.498200232499384e-05, 'epoch': 0.35} {'loss': 0.8462, 'learning_rate': 1.4976598828224643e-05, 'epoch': 0.35} {'loss': 0.8384, 'learning_rate': 1.497119339936843e-05, 'epoch': 0.35} {'loss': 0.8198, 'learning_rate': 1.4965786040523779e-05, 'epoch': 0.35} {'loss': 0.7412, 'learning_rate': 1.496037675379001e-05, 'epoch': 0.35} {'loss': 0.8647, 'learning_rate': 1.4954965541267192e-05, 'epoch': 0.35} {'loss': 0.8496, 'learning_rate': 1.494955240505615e-05, 'epoch': 0.36} {'loss': 0.8374, 'learning_rate': 1.494413734725844e-05, 'epoch': 0.36} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1562614479.jpg' {'loss': 0.793, 'learning_rate': 1.4938720369976385e-05, 'epoch': 0.36} {'loss': 0.8022, 'learning_rate': 1.4933301475313036e-05, 'epoch': 0.36} {'loss': 0.8188, 'learning_rate': 1.4927880665372197e-05, 'epoch': 0.36} {'loss': 0.2986, 'learning_rate': 1.4922457942258411e-05, 'epoch': 0.36} {'loss': 0.8149, 'learning_rate': 1.4917033308076967e-05, 'epoch': 0.36} {'loss': 0.7808, 'learning_rate': 1.4911606764933892e-05, 'epoch': 0.36} {'loss': 0.8027, 'learning_rate': 1.490617831493596e-05, 'epoch': 0.36} {'loss': 0.7666, 'learning_rate': 1.4900747960190682e-05, 'epoch': 0.36} {'loss': 0.8164, 'learning_rate': 1.489531570280631e-05, 'epoch': 0.36} {'loss': 0.7368, 'learning_rate': 1.488988154489183e-05, 'epoch': 0.36} {'loss': 0.7852, 'learning_rate': 1.4884445488556972e-05, 'epoch': 0.36} {'loss': 0.7832, 'learning_rate': 1.4879007535912198e-05, 'epoch': 0.36} [2024-01-31 01:21:22,110] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7734, 'learning_rate': 1.4873567689068708e-05, 'epoch': 0.36} [2024-01-31 01:21:40,948] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8008, 'learning_rate': 1.4868125950138442e-05, 'epoch': 0.36} {'loss': 0.8325, 'learning_rate': 1.4862682321234064e-05, 'epoch': 0.36} [2024-01-31 01:22:21,535] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7949, 'learning_rate': 1.4857236804468983e-05, 'epoch': 0.36} {'loss': 0.8398, 'learning_rate': 1.4851789401957338e-05, 'epoch': 0.36} [2024-01-31 01:22:57,134] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7515, 'learning_rate': 1.4846340115813993e-05, 'epoch': 0.36} [2024-01-31 01:23:15,616] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8291, 'learning_rate': 1.484088894815455e-05, 'epoch': 0.36} [2024-01-31 01:23:33,265] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7988, 'learning_rate': 1.4835435901095341e-05, 'epoch': 0.36} {'loss': 0.7979, 'learning_rate': 1.4829980976753426e-05, 'epoch': 0.36} {'loss': 0.8159, 'learning_rate': 1.4824524177246597e-05, 'epoch': 0.36} [2024-01-31 01:24:28,207] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7793, 'learning_rate': 1.4819065504693365e-05, 'epoch': 0.36} {'loss': 0.8438, 'learning_rate': 1.4813604961212984e-05, 'epoch': 0.36} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/871563932.jpg' {'loss': 0.8247, 'learning_rate': 1.4808142548925417e-05, 'epoch': 0.36} {'loss': 0.7876, 'learning_rate': 1.4802678269951365e-05, 'epoch': 0.36} {'loss': 0.8379, 'learning_rate': 1.4797212126412243e-05, 'epoch': 0.36} {'loss': 0.7969, 'learning_rate': 1.4791744120430202e-05, 'epoch': 0.36} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/965150739.jpg' {'loss': 0.7607, 'learning_rate': 1.4786274254128112e-05, 'epoch': 0.36} {'loss': 0.8501, 'learning_rate': 1.4780802529629559e-05, 'epoch': 0.36} {'loss': 0.7861, 'learning_rate': 1.4775328949058856e-05, 'epoch': 0.36} {'loss': 0.8364, 'learning_rate': 1.4769853514541037e-05, 'epoch': 0.36} {'loss': 0.8521, 'learning_rate': 1.4764376228201848e-05, 'epoch': 0.36} {'loss': 0.7729, 'learning_rate': 1.475889709216777e-05, 'epoch': 0.36} {'loss': 0.8271, 'learning_rate': 1.4753416108565985e-05, 'epoch': 0.36} {'loss': 0.7817, 'learning_rate': 1.47479332795244e-05, 'epoch': 0.36} {'loss': 0.8398, 'learning_rate': 1.4742448607171644e-05, 'epoch': 0.36} {'loss': 0.7925, 'learning_rate': 1.473696209363705e-05, 'epoch': 0.36} {'loss': 0.8472, 'learning_rate': 1.4731473741050673e-05, 'epoch': 0.36} {'loss': 0.8394, 'learning_rate': 1.4725983551543279e-05, 'epoch': 0.36} {'loss': 0.8047, 'learning_rate': 1.472049152724635e-05, 'epoch': 0.36} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1574301012.jpg' {'loss': 0.8115, 'learning_rate': 1.471499767029208e-05, 'epoch': 0.36} {'loss': 0.8149, 'learning_rate': 1.470950198281337e-05, 'epoch': 0.36} {'loss': 0.8237, 'learning_rate': 1.470400446694384e-05, 'epoch': 0.36} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/679445765.jpg' {'loss': 0.2828, 'learning_rate': 1.4698505124817811e-05, 'epoch': 0.36} {'loss': 0.7739, 'learning_rate': 1.4693003958570318e-05, 'epoch': 0.36} {'loss': 0.7905, 'learning_rate': 1.4687500970337103e-05, 'epoch': 0.36} {'loss': 0.8257, 'learning_rate': 1.4681996162254618e-05, 'epoch': 0.36} {'loss': 0.812, 'learning_rate': 1.4676489536460015e-05, 'epoch': 0.36} {'loss': 0.7861, 'learning_rate': 1.467098109509116e-05, 'epoch': 0.36} [2024-01-31 01:33:00,581] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7891, 'learning_rate': 1.4665470840286614e-05, 'epoch': 0.37} {'loss': 0.8003, 'learning_rate': 1.4659958774185654e-05, 'epoch': 0.37} {'loss': 0.8564, 'learning_rate': 1.4654444898928249e-05, 'epoch': 0.37} {'loss': 0.8066, 'learning_rate': 1.4648929216655077e-05, 'epoch': 0.37} {'loss': 0.8242, 'learning_rate': 1.4643411729507517e-05, 'epoch': 0.37} {'loss': 0.8252, 'learning_rate': 1.4637892439627644e-05, 'epoch': 0.37} {'loss': 0.7969, 'learning_rate': 1.4632371349158241e-05, 'epoch': 0.37} {'loss': 0.8486, 'learning_rate': 1.4626848460242782e-05, 'epoch': 0.37} {'loss': 0.8208, 'learning_rate': 1.4621323775025444e-05, 'epoch': 0.37} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/942627458.jpg' {'loss': 0.7998, 'learning_rate': 1.4615797295651099e-05, 'epoch': 0.37} {'loss': 0.8379, 'learning_rate': 1.4610269024265317e-05, 'epoch': 0.37} {'loss': 0.2877, 'learning_rate': 1.4604738963014365e-05, 'epoch': 0.37} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1558743030.jpg' {'loss': 0.8496, 'learning_rate': 1.4599207114045202e-05, 'epoch': 0.37} {'loss': 0.8384, 'learning_rate': 1.4593673479505482e-05, 'epoch': 0.37} {'loss': 0.255, 'learning_rate': 1.4588138061543551e-05, 'epoch': 0.37} {'loss': 0.7573, 'learning_rate': 1.458260086230845e-05, 'epoch': 0.37} {'loss': 0.8584, 'learning_rate': 1.4577061883949912e-05, 'epoch': 0.37} {'loss': 0.8252, 'learning_rate': 1.4571521128618358e-05, 'epoch': 0.37} {'loss': 0.8394, 'learning_rate': 1.4565978598464895e-05, 'epoch': 0.37} [2024-01-31 01:38:51,410] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7769, 'learning_rate': 1.4560434295641338e-05, 'epoch': 0.37} [2024-01-31 01:39:08,169] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8276, 'learning_rate': 1.455488822230016e-05, 'epoch': 0.37} {'loss': 0.7925, 'learning_rate': 1.4549340380594545e-05, 'epoch': 0.37} {'loss': 0.8027, 'learning_rate': 1.454379077267836e-05, 'epoch': 0.37} {'loss': 0.7969, 'learning_rate': 1.4538239400706147e-05, 'epoch': 0.37} {'loss': 0.8169, 'learning_rate': 1.4532686266833143e-05, 'epoch': 0.37} [2024-01-31 01:40:46,329] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7998, 'learning_rate': 1.4527131373215265e-05, 'epoch': 0.37} [2024-01-31 01:41:03,246] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8667, 'learning_rate': 1.4521574722009115e-05, 'epoch': 0.37} {'loss': 0.8091, 'learning_rate': 1.4516016315371974e-05, 'epoch': 0.37} [2024-01-31 01:41:44,061] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8188, 'learning_rate': 1.4510456155461807e-05, 'epoch': 0.37} [2024-01-31 01:42:05,856] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7563, 'learning_rate': 1.4504894244437264e-05, 'epoch': 0.37} {'loss': 0.8706, 'learning_rate': 1.4499330584457667e-05, 'epoch': 0.37} {'loss': 0.7939, 'learning_rate': 1.4493765177683017e-05, 'epoch': 0.37} {'loss': 0.8652, 'learning_rate': 1.4488198026274007e-05, 'epoch': 0.37} {'loss': 0.791, 'learning_rate': 1.4482629132391985e-05, 'epoch': 0.37} {'loss': 0.8345, 'learning_rate': 1.4477058498198993e-05, 'epoch': 0.37} {'loss': 0.7617, 'learning_rate': 1.4471486125857743e-05, 'epoch': 0.37} {'loss': 0.8359, 'learning_rate': 1.446591201753162e-05, 'epoch': 0.37} {'loss': 0.8198, 'learning_rate': 1.4460336175384688e-05, 'epoch': 0.37} {'loss': 0.2666, 'learning_rate': 1.4454758601581675e-05, 'epoch': 0.37} {'loss': 0.874, 'learning_rate': 1.4449179298287999e-05, 'epoch': 0.37} {'loss': 0.8291, 'learning_rate': 1.4443598267669723e-05, 'epoch': 0.37} {'loss': 0.8589, 'learning_rate': 1.4438015511893602e-05, 'epoch': 0.37} {'loss': 0.8154, 'learning_rate': 1.4432431033127056e-05, 'epoch': 0.37} {'loss': 0.7935, 'learning_rate': 1.442684483353817e-05, 'epoch': 0.37} {'loss': 0.7905, 'learning_rate': 1.4421256915295697e-05, 'epoch': 0.37} {'loss': 0.8101, 'learning_rate': 1.4415667280569064e-05, 'epoch': 0.37} [2024-01-31 01:47:24,618] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8145, 'learning_rate': 1.4410075931528356e-05, 'epoch': 0.37} {'loss': 0.8037, 'learning_rate': 1.4404482870344322e-05, 'epoch': 0.37} [2024-01-31 01:48:01,487] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8057, 'learning_rate': 1.4398888099188396e-05, 'epoch': 0.37} {'loss': 0.8555, 'learning_rate': 1.4393291620232646e-05, 'epoch': 0.37} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/295968265.jpg' [2024-01-31 01:48:36,524] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8037, 'learning_rate': 1.4387693435649826e-05, 'epoch': 0.37} WARNING: tokenization mismatch: 1 vs. 624. (ignored) {'loss': 0.7788, 'learning_rate': 1.4382093547613338e-05, 'epoch': 0.37} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/789401592.jpg' {'loss': 0.8506, 'learning_rate': 1.4376491958297263e-05, 'epoch': 0.38} [2024-01-31 01:49:32,864] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.277, 'learning_rate': 1.4370888669876317e-05, 'epoch': 0.38} [2024-01-31 01:49:51,828] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8394, 'learning_rate': 1.4365283684525895e-05, 'epoch': 0.38} {'loss': 0.8169, 'learning_rate': 1.4359677004422045e-05, 'epoch': 0.38} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/412132710.jpg' [2024-01-31 01:50:31,023] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8242, 'learning_rate': 1.4354068631741476e-05, 'epoch': 0.38} [2024-01-31 01:50:49,330] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7656, 'learning_rate': 1.4348458568661548e-05, 'epoch': 0.38} {'loss': 0.8301, 'learning_rate': 1.434284681736028e-05, 'epoch': 0.38} [2024-01-31 01:51:25,768] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8071, 'learning_rate': 1.4337233380016354e-05, 'epoch': 0.38} [2024-01-31 01:51:45,624] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2557, 'learning_rate': 1.433161825880909e-05, 'epoch': 0.38} [2024-01-31 01:52:05,326] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8843, 'learning_rate': 1.432600145591848e-05, 'epoch': 0.38} {'loss': 0.7993, 'learning_rate': 1.4320382973525151e-05, 'epoch': 0.38} {'loss': 0.7983, 'learning_rate': 1.43147628138104e-05, 'epoch': 0.38} {'loss': 0.8247, 'learning_rate': 1.4309140978956161e-05, 'epoch': 0.38} {'loss': 0.7451, 'learning_rate': 1.430351747114503e-05, 'epoch': 0.38} {'loss': 0.8267, 'learning_rate': 1.429789229256024e-05, 'epoch': 0.38} {'loss': 0.8125, 'learning_rate': 1.429226544538568e-05, 'epoch': 0.38} {'loss': 0.8267, 'learning_rate': 1.4286636931805887e-05, 'epoch': 0.38} {'loss': 0.8003, 'learning_rate': 1.4281006754006045e-05, 'epoch': 0.38} {'loss': 0.259, 'learning_rate': 1.427537491417198e-05, 'epoch': 0.38} {'loss': 0.7671, 'learning_rate': 1.426974141449017e-05, 'epoch': 0.38} {'loss': 0.7959, 'learning_rate': 1.4264106257147732e-05, 'epoch': 0.38} {'loss': 0.8208, 'learning_rate': 1.4258469444332423e-05, 'epoch': 0.38} {'loss': 0.2562, 'learning_rate': 1.4252830978232658e-05, 'epoch': 0.38} {'loss': 0.8252, 'learning_rate': 1.4247190861037474e-05, 'epoch': 0.38} {'loss': 0.8018, 'learning_rate': 1.4241549094936567e-05, 'epoch': 0.38} {'loss': 0.8271, 'learning_rate': 1.4235905682120255e-05, 'epoch': 0.38} {'loss': 0.7573, 'learning_rate': 1.4230260624779512e-05, 'epoch': 0.38} {'loss': 0.8164, 'learning_rate': 1.4224613925105947e-05, 'epoch': 0.38} {'loss': 0.8535, 'learning_rate': 1.4218965585291792e-05, 'epoch': 0.38} {'loss': 0.7656, 'learning_rate': 1.4213315607529939e-05, 'epoch': 0.38} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/789408554.jpg' {'loss': 0.853, 'learning_rate': 1.4207663994013896e-05, 'epoch': 0.38} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/3884452762.jpg' {'loss': 0.8457, 'learning_rate': 1.4202010746937815e-05, 'epoch': 0.38} {'loss': 0.7822, 'learning_rate': 1.4196355868496485e-05, 'epoch': 0.38} {'loss': 0.8184, 'learning_rate': 1.4190699360885323e-05, 'epoch': 0.38} {'loss': 0.8359, 'learning_rate': 1.4185041226300376e-05, 'epoch': 0.38} {'loss': 0.7905, 'learning_rate': 1.4179381466938332e-05, 'epoch': 0.38} {'loss': 0.8418, 'learning_rate': 1.4173720084996501e-05, 'epoch': 0.38} {'loss': 0.8354, 'learning_rate': 1.4168057082672828e-05, 'epoch': 0.38} {'loss': 0.8809, 'learning_rate': 1.4162392462165884e-05, 'epoch': 0.38} {'loss': 0.7515, 'learning_rate': 1.4156726225674874e-05, 'epoch': 0.38} {'loss': 0.8276, 'learning_rate': 1.415105837539962e-05, 'epoch': 0.38} {'loss': 0.7964, 'learning_rate': 1.414538891354058e-05, 'epoch': 0.38} {'loss': 0.7358, 'learning_rate': 1.4139717842298835e-05, 'epoch': 0.38} {'loss': 0.27, 'learning_rate': 1.4134045163876086e-05, 'epoch': 0.38} {'loss': 0.8271, 'learning_rate': 1.4128370880474667e-05, 'epoch': 0.38} {'loss': 0.8213, 'learning_rate': 1.412269499429753e-05, 'epoch': 0.38} {'loss': 0.811, 'learning_rate': 1.4117017507548244e-05, 'epoch': 0.38} {'loss': 0.8145, 'learning_rate': 1.4111338422431013e-05, 'epoch': 0.38} {'loss': 0.8286, 'learning_rate': 1.4105657741150648e-05, 'epoch': 0.38} {'loss': 0.2679, 'learning_rate': 1.4099975465912584e-05, 'epoch': 0.38} {'loss': 0.8228, 'learning_rate': 1.4094291598922877e-05, 'epoch': 0.38} {'loss': 0.8276, 'learning_rate': 1.40886061423882e-05, 'epoch': 0.38} {'loss': 0.8311, 'learning_rate': 1.4082919098515846e-05, 'epoch': 0.39} {'loss': 0.2465, 'learning_rate': 1.407723046951372e-05, 'epoch': 0.39} {'loss': 0.8193, 'learning_rate': 1.4071540257590341e-05, 'epoch': 0.39} {'loss': 0.8296, 'learning_rate': 1.4065848464954848e-05, 'epoch': 0.39} {'loss': 0.2933, 'learning_rate': 1.4060155093816988e-05, 'epoch': 0.39} {'loss': 0.7827, 'learning_rate': 1.4054460146387124e-05, 'epoch': 0.39} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/877792356.jpg' {'loss': 0.8047, 'learning_rate': 1.4048763624876233e-05, 'epoch': 0.39} {'loss': 0.8438, 'learning_rate': 1.4043065531495904e-05, 'epoch': 0.39} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/517884283.jpg' {'loss': 0.2871, 'learning_rate': 1.4037365868458325e-05, 'epoch': 0.39} {'loss': 0.7959, 'learning_rate': 1.4031664637976305e-05, 'epoch': 0.39} {'loss': 0.7183, 'learning_rate': 1.402596184226326e-05, 'epoch': 0.39} {'loss': 0.8442, 'learning_rate': 1.4020257483533208e-05, 'epoch': 0.39} {'loss': 0.8174, 'learning_rate': 1.401455156400078e-05, 'epoch': 0.39} {'loss': 0.814, 'learning_rate': 1.400884408588121e-05, 'epoch': 0.39} {'loss': 0.7964, 'learning_rate': 1.400313505139034e-05, 'epoch': 0.39} {'loss': 0.7935, 'learning_rate': 1.3997424462744607e-05, 'epoch': 0.39} {'loss': 0.8398, 'learning_rate': 1.3991712322161065e-05, 'epoch': 0.39} {'loss': 0.8457, 'learning_rate': 1.3985998631857359e-05, 'epoch': 0.39} {'loss': 0.8008, 'learning_rate': 1.398028339405174e-05, 'epoch': 0.39} [2024-01-31 02:14:02,330] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7568, 'learning_rate': 1.3974566610963068e-05, 'epoch': 0.39} [2024-01-31 02:14:21,166] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8369, 'learning_rate': 1.3968848284810785e-05, 'epoch': 0.39} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1579771009.jpg' {'loss': 0.7754, 'learning_rate': 1.3963128417814951e-05, 'epoch': 0.39} [2024-01-31 02:14:56,878] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7939, 'learning_rate': 1.3957407012196204e-05, 'epoch': 0.39} {'loss': 0.8125, 'learning_rate': 1.3951684070175802e-05, 'epoch': 0.39} {'loss': 0.7598, 'learning_rate': 1.3945959593975582e-05, 'epoch': 0.39} {'loss': 0.835, 'learning_rate': 1.3940233585817984e-05, 'epoch': 0.39} [2024-01-31 02:16:14,040] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8057, 'learning_rate': 1.3934506047926042e-05, 'epoch': 0.39} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1569750882.jpg' {'loss': 0.7896, 'learning_rate': 1.3928776982523384e-05, 'epoch': 0.39} {'loss': 0.7915, 'learning_rate': 1.3923046391834229e-05, 'epoch': 0.39} {'loss': 0.8145, 'learning_rate': 1.3917314278083391e-05, 'epoch': 0.39} [2024-01-31 02:17:29,035] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8682, 'learning_rate': 1.3911580643496272e-05, 'epoch': 0.39} {'loss': 0.8193, 'learning_rate': 1.3905845490298867e-05, 'epoch': 0.39} {'loss': 0.8472, 'learning_rate': 1.390010882071776e-05, 'epoch': 0.39} [2024-01-31 02:18:26,781] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7354, 'learning_rate': 1.3894370636980128e-05, 'epoch': 0.39} {'loss': 0.8003, 'learning_rate': 1.3888630941313728e-05, 'epoch': 0.39} {'loss': 0.8696, 'learning_rate': 1.3882889735946901e-05, 'epoch': 0.39} {'loss': 0.8018, 'learning_rate': 1.3877147023108592e-05, 'epoch': 0.39} {'loss': 0.7964, 'learning_rate': 1.3871402805028314e-05, 'epoch': 0.39} {'loss': 0.2812, 'learning_rate': 1.3865657083936167e-05, 'epoch': 0.39} {'loss': 0.8472, 'learning_rate': 1.3859909862062844e-05, 'epoch': 0.39} {'loss': 0.2275, 'learning_rate': 1.385416114163961e-05, 'epoch': 0.39} {'loss': 0.8481, 'learning_rate': 1.3848410924898321e-05, 'epoch': 0.39} {'loss': 0.8159, 'learning_rate': 1.3842659214071406e-05, 'epoch': 0.39} {'loss': 0.8247, 'learning_rate': 1.3836906011391878e-05, 'epoch': 0.39} {'loss': 0.8149, 'learning_rate': 1.3831151319093323e-05, 'epoch': 0.39} {'loss': 0.769, 'learning_rate': 1.382539513940992e-05, 'epoch': 0.39} {'loss': 0.8281, 'learning_rate': 1.3819637474576411e-05, 'epoch': 0.39} {'loss': 0.7812, 'learning_rate': 1.381387832682812e-05, 'epoch': 0.39} {'loss': 0.8638, 'learning_rate': 1.380811769840095e-05, 'epoch': 0.39} {'loss': 0.7998, 'learning_rate': 1.3802355591531366e-05, 'epoch': 0.39} {'loss': 0.8403, 'learning_rate': 1.3796592008456427e-05, 'epoch': 0.39} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1566250420.jpg' {'loss': 0.8115, 'learning_rate': 1.3790826951413747e-05, 'epoch': 0.39} {'loss': 0.2715, 'learning_rate': 1.3785060422641526e-05, 'epoch': 0.4} {'loss': 0.8027, 'learning_rate': 1.3779292424378521e-05, 'epoch': 0.4} {'loss': 0.8145, 'learning_rate': 1.3773522958864076e-05, 'epoch': 0.4} {'loss': 0.8115, 'learning_rate': 1.3767752028338091e-05, 'epoch': 0.4} {'loss': 0.7749, 'learning_rate': 1.376197963504104e-05, 'epoch': 0.4} {'loss': 0.2694, 'learning_rate': 1.3756205781213965e-05, 'epoch': 0.4} {'loss': 0.8135, 'learning_rate': 1.375043046909848e-05, 'epoch': 0.4} {'loss': 0.8032, 'learning_rate': 1.3744653700936752e-05, 'epoch': 0.4} {'loss': 0.8252, 'learning_rate': 1.3738875478971526e-05, 'epoch': 0.4} {'loss': 0.8438, 'learning_rate': 1.3733095805446107e-05, 'epoch': 0.4} {'loss': 0.7944, 'learning_rate': 1.372731468260436e-05, 'epoch': 0.4} {'loss': 0.8071, 'learning_rate': 1.372153211269072e-05, 'epoch': 0.4} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/553062204.jpg' {'loss': 0.7866, 'learning_rate': 1.3715748097950176e-05, 'epoch': 0.4} {'loss': 0.8203, 'learning_rate': 1.3709962640628284e-05, 'epoch': 0.4} [2024-01-31 02:28:34,642] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8018, 'learning_rate': 1.3704175742971158e-05, 'epoch': 0.4} [2024-01-31 02:28:52,345] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7974, 'learning_rate': 1.369838740722547e-05, 'epoch': 0.4} {'loss': 0.7886, 'learning_rate': 1.3692597635638452e-05, 'epoch': 0.4} {'loss': 0.2617, 'learning_rate': 1.368680643045789e-05, 'epoch': 0.4} {'loss': 0.2758, 'learning_rate': 1.3681013793932132e-05, 'epoch': 0.4} {'loss': 0.8228, 'learning_rate': 1.3675219728310076e-05, 'epoch': 0.4} {'loss': 0.7593, 'learning_rate': 1.3669424235841185e-05, 'epoch': 0.4} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1574320793.jpg' {'loss': 0.7756, 'learning_rate': 1.3663627318775459e-05, 'epoch': 0.4} {'loss': 0.8076, 'learning_rate': 1.3657828979363468e-05, 'epoch': 0.4} {'loss': 0.7871, 'learning_rate': 1.3652029219856324e-05, 'epoch': 0.4} {'loss': 0.7808, 'learning_rate': 1.3646228042505694e-05, 'epoch': 0.4} {'loss': 0.8174, 'learning_rate': 1.3640425449563793e-05, 'epoch': 0.4} {'loss': 0.833, 'learning_rate': 1.3634621443283389e-05, 'epoch': 0.4} {'loss': 0.8057, 'learning_rate': 1.36288160259178e-05, 'epoch': 0.4} {'loss': 0.7539, 'learning_rate': 1.3623009199720882e-05, 'epoch': 0.4} {'loss': 0.813, 'learning_rate': 1.3617200966947053e-05, 'epoch': 0.4} {'loss': 0.769, 'learning_rate': 1.3611391329851262e-05, 'epoch': 0.4} {'loss': 0.8086, 'learning_rate': 1.3605580290689013e-05, 'epoch': 0.4} {'loss': 0.8018, 'learning_rate': 1.3599767851716353e-05, 'epoch': 0.4} {'loss': 0.8057, 'learning_rate': 1.3593954015189867e-05, 'epoch': 0.4} {'loss': 0.8296, 'learning_rate': 1.3588138783366692e-05, 'epoch': 0.4} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/679844023.jpg' {'loss': 0.8379, 'learning_rate': 1.3582322158504495e-05, 'epoch': 0.4} {'loss': 0.8057, 'learning_rate': 1.3576504142861496e-05, 'epoch': 0.4} {'loss': 0.8081, 'learning_rate': 1.3570684738696444e-05, 'epoch': 0.4} {'loss': 0.8276, 'learning_rate': 1.3564863948268631e-05, 'epoch': 0.4} {'loss': 0.8608, 'learning_rate': 1.3559041773837898e-05, 'epoch': 0.4} {'loss': 0.8398, 'learning_rate': 1.3553218217664603e-05, 'epoch': 0.4} {'loss': 0.7812, 'learning_rate': 1.3547393282009656e-05, 'epoch': 0.4} {'loss': 0.8203, 'learning_rate': 1.3541566969134496e-05, 'epoch': 0.4} {'loss': 0.8164, 'learning_rate': 1.3535739281301102e-05, 'epoch': 0.4} {'loss': 0.7666, 'learning_rate': 1.3529910220771975e-05, 'epoch': 0.4} {'loss': 0.8218, 'learning_rate': 1.3524079789810163e-05, 'epoch': 0.4} [2024-01-31 02:38:20,869] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8613, 'learning_rate': 1.3518247990679241e-05, 'epoch': 0.4} {'loss': 0.8179, 'learning_rate': 1.3512414825643312e-05, 'epoch': 0.4} {'loss': 0.7759, 'learning_rate': 1.3506580296967011e-05, 'epoch': 0.4} {'loss': 0.7744, 'learning_rate': 1.3500744406915505e-05, 'epoch': 0.4} {'loss': 0.7681, 'learning_rate': 1.3494907157754485e-05, 'epoch': 0.4} {'loss': 0.8862, 'learning_rate': 1.348906855175017e-05, 'epoch': 0.4} {'loss': 0.8765, 'learning_rate': 1.3483228591169315e-05, 'epoch': 0.41} {'loss': 0.7896, 'learning_rate': 1.347738727827919e-05, 'epoch': 0.41} {'loss': 0.8013, 'learning_rate': 1.3471544615347591e-05, 'epoch': 0.41} {'loss': 0.769, 'learning_rate': 1.3465700604642847e-05, 'epoch': 0.41} {'loss': 0.8472, 'learning_rate': 1.34598552484338e-05, 'epoch': 0.41} {'loss': 0.7998, 'learning_rate': 1.3454008548989816e-05, 'epoch': 0.41} {'loss': 0.7783, 'learning_rate': 1.3448160508580789e-05, 'epoch': 0.41} {'loss': 0.8169, 'learning_rate': 1.3442311129477133e-05, 'epoch': 0.41} {'loss': 0.2692, 'learning_rate': 1.343646041394977e-05, 'epoch': 0.41} {'loss': 0.8403, 'learning_rate': 1.3430608364270156e-05, 'epoch': 0.41} {'loss': 0.7856, 'learning_rate': 1.3424754982710256e-05, 'epoch': 0.41} {'loss': 0.2889, 'learning_rate': 1.3418900271542552e-05, 'epoch': 0.41} {'loss': 0.8384, 'learning_rate': 1.3413044233040045e-05, 'epoch': 0.41} {'loss': 0.8096, 'learning_rate': 1.3407186869476253e-05, 'epoch': 0.41} {'loss': 0.7534, 'learning_rate': 1.3401328183125208e-05, 'epoch': 0.41} {'loss': 0.7759, 'learning_rate': 1.339546817626145e-05, 'epoch': 0.41} {'loss': 0.8359, 'learning_rate': 1.3389606851160037e-05, 'epoch': 0.41} {'loss': 0.8564, 'learning_rate': 1.3383744210096537e-05, 'epoch': 0.41} {'loss': 0.7739, 'learning_rate': 1.3377880255347026e-05, 'epoch': 0.41} {'loss': 0.8096, 'learning_rate': 1.3372014989188098e-05, 'epoch': 0.41} {'loss': 0.7744, 'learning_rate': 1.3366148413896851e-05, 'epoch': 0.41} {'loss': 0.2732, 'learning_rate': 1.3360280531750886e-05, 'epoch': 0.41} {'loss': 0.8545, 'learning_rate': 1.3354411345028324e-05, 'epoch': 0.41} {'loss': 0.7715, 'learning_rate': 1.3348540856007782e-05, 'epoch': 0.41} {'loss': 0.8647, 'learning_rate': 1.3342669066968385e-05, 'epoch': 0.41} {'loss': 0.8735, 'learning_rate': 1.3336795980189763e-05, 'epoch': 0.41} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1879505460.jpg' {'loss': 0.8262, 'learning_rate': 1.3330921597952056e-05, 'epoch': 0.41} {'loss': 0.8076, 'learning_rate': 1.3325045922535896e-05, 'epoch': 0.41} {'loss': 0.8462, 'learning_rate': 1.3319168956222423e-05, 'epoch': 0.41} {'loss': 0.7993, 'learning_rate': 1.331329070129328e-05, 'epoch': 0.41} {'loss': 0.8291, 'learning_rate': 1.3307411160030608e-05, 'epoch': 0.41} {'loss': 0.8062, 'learning_rate': 1.3301530334717046e-05, 'epoch': 0.41} {'loss': 0.8423, 'learning_rate': 1.3295648227635729e-05, 'epoch': 0.41} {'loss': 0.853, 'learning_rate': 1.32897648410703e-05, 'epoch': 0.41} {'loss': 0.8086, 'learning_rate': 1.328388017730489e-05, 'epoch': 0.41} {'loss': 0.7278, 'learning_rate': 1.327799423862413e-05, 'epoch': 0.41} {'loss': 0.8398, 'learning_rate': 1.3272107027313142e-05, 'epoch': 0.41} {'loss': 0.8237, 'learning_rate': 1.3266218545657541e-05, 'epoch': 0.41} {'loss': 0.8242, 'learning_rate': 1.326032879594344e-05, 'epoch': 0.41} {'loss': 0.7222, 'learning_rate': 1.3254437780457448e-05, 'epoch': 0.41} {'loss': 0.8003, 'learning_rate': 1.3248545501486654e-05, 'epoch': 0.41} {'loss': 0.8013, 'learning_rate': 1.3242651961318646e-05, 'epoch': 0.41} {'loss': 0.8301, 'learning_rate': 1.32367571622415e-05, 'epoch': 0.41} {'loss': 0.8691, 'learning_rate': 1.3230861106543777e-05, 'epoch': 0.41} {'loss': 0.7935, 'learning_rate': 1.3224963796514532e-05, 'epoch': 0.41} {'loss': 0.8237, 'learning_rate': 1.32190652344433e-05, 'epoch': 0.41} {'loss': 0.7686, 'learning_rate': 1.3213165422620111e-05, 'epoch': 0.41} {'loss': 0.8408, 'learning_rate': 1.3207264363335472e-05, 'epoch': 0.41} {'loss': 0.8066, 'learning_rate': 1.3201362058880375e-05, 'epoch': 0.41} {'loss': 0.8145, 'learning_rate': 1.3195458511546307e-05, 'epoch': 0.41} {'loss': 0.7788, 'learning_rate': 1.3189553723625217e-05, 'epoch': 0.41} {'loss': 0.7876, 'learning_rate': 1.318364769740955e-05, 'epoch': 0.41} {'loss': 0.853, 'learning_rate': 1.3177740435192235e-05, 'epoch': 0.42} {'loss': 0.7959, 'learning_rate': 1.3171831939266668e-05, 'epoch': 0.42} {'loss': 0.7539, 'learning_rate': 1.3165922211926734e-05, 'epoch': 0.42} {'loss': 0.8135, 'learning_rate': 1.3160011255466791e-05, 'epoch': 0.42} {'loss': 0.8833, 'learning_rate': 1.3154099072181677e-05, 'epoch': 0.42} {'loss': 0.832, 'learning_rate': 1.3148185664366704e-05, 'epoch': 0.42} {'loss': 0.7881, 'learning_rate': 1.314227103431766e-05, 'epoch': 0.42} {'loss': 0.7812, 'learning_rate': 1.3136355184330809e-05, 'epoch': 0.42} {'loss': 0.7964, 'learning_rate': 1.3130438116702888e-05, 'epoch': 0.42} {'loss': 0.7412, 'learning_rate': 1.3124519833731106e-05, 'epoch': 0.42} {'loss': 0.793, 'learning_rate': 1.3118600337713146e-05, 'epoch': 0.42} {'loss': 0.8193, 'learning_rate': 1.3112679630947156e-05, 'epoch': 0.42} {'loss': 0.8159, 'learning_rate': 1.310675771573176e-05, 'epoch': 0.42} {'loss': 0.7832, 'learning_rate': 1.310083459436605e-05, 'epoch': 0.42} {'loss': 0.7749, 'learning_rate': 1.3094910269149587e-05, 'epoch': 0.42} {'loss': 0.8042, 'learning_rate': 1.3088984742382395e-05, 'epoch': 0.42} {'loss': 0.8115, 'learning_rate': 1.3083058016364972e-05, 'epoch': 0.42} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/786405511.jpg' {'loss': 0.8179, 'learning_rate': 1.3077130093398274e-05, 'epoch': 0.42} {'loss': 0.7705, 'learning_rate': 1.3071200975783725e-05, 'epoch': 0.42} {'loss': 0.7598, 'learning_rate': 1.3065270665823206e-05, 'epoch': 0.42} {'loss': 0.8188, 'learning_rate': 1.3059339165819082e-05, 'epoch': 0.42} {'loss': 0.8779, 'learning_rate': 1.3053406478074155e-05, 'epoch': 0.42} {'loss': 0.771, 'learning_rate': 1.3047472604891701e-05, 'epoch': 0.42} {'loss': 0.8413, 'learning_rate': 1.3041537548575455e-05, 'epoch': 0.42} {'loss': 0.772, 'learning_rate': 1.303560131142961e-05, 'epoch': 0.42} {'loss': 0.8071, 'learning_rate': 1.3029663895758814e-05, 'epoch': 0.42} {'loss': 0.8101, 'learning_rate': 1.3023725303868183e-05, 'epoch': 0.42} {'loss': 0.8359, 'learning_rate': 1.3017785538063277e-05, 'epoch': 0.42} {'loss': 0.7891, 'learning_rate': 1.3011844600650121e-05, 'epoch': 0.42} {'loss': 0.8433, 'learning_rate': 1.300590249393519e-05, 'epoch': 0.42} {'loss': 0.8242, 'learning_rate': 1.2999959220225416e-05, 'epoch': 0.42} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1883323460.jpg' {'loss': 0.8467, 'learning_rate': 1.299401478182818e-05, 'epoch': 0.42} {'loss': 0.7988, 'learning_rate': 1.2988069181051314e-05, 'epoch': 0.42} {'loss': 0.8081, 'learning_rate': 1.2982122420203114e-05, 'epoch': 0.42} {'loss': 0.7666, 'learning_rate': 1.2976174501592313e-05, 'epoch': 0.42} {'loss': 0.7983, 'learning_rate': 1.2970225427528098e-05, 'epoch': 0.42} {'loss': 0.7695, 'learning_rate': 1.2964275200320104e-05, 'epoch': 0.42} {'loss': 0.79, 'learning_rate': 1.2958323822278413e-05, 'epoch': 0.42} {'loss': 0.8384, 'learning_rate': 1.2952371295713558e-05, 'epoch': 0.42} {'loss': 0.8115, 'learning_rate': 1.2946417622936512e-05, 'epoch': 0.42} {'loss': 0.832, 'learning_rate': 1.2940462806258696e-05, 'epoch': 0.42} {'loss': 0.2687, 'learning_rate': 1.2934506847991976e-05, 'epoch': 0.42} {'loss': 0.7998, 'learning_rate': 1.2928549750448661e-05, 'epoch': 0.42} {'loss': 0.2587, 'learning_rate': 1.2922591515941498e-05, 'epoch': 0.42} {'loss': 0.7539, 'learning_rate': 1.2916632146783683e-05, 'epoch': 0.42} {'loss': 0.8135, 'learning_rate': 1.2910671645288841e-05, 'epoch': 0.42} [2024-01-31 03:10:34,275] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7451, 'learning_rate': 1.2904710013771054e-05, 'epoch': 0.42} {'loss': 0.7837, 'learning_rate': 1.2898747254544826e-05, 'epoch': 0.42} {'loss': 0.8481, 'learning_rate': 1.2892783369925105e-05, 'epoch': 0.42} {'loss': 0.8003, 'learning_rate': 1.2886818362227283e-05, 'epoch': 0.42} {'loss': 0.2506, 'learning_rate': 1.2880852233767174e-05, 'epoch': 0.42} {'loss': 0.7827, 'learning_rate': 1.2874884986861038e-05, 'epoch': 0.42} {'loss': 0.7988, 'learning_rate': 1.2868916623825561e-05, 'epoch': 0.43} {'loss': 0.7866, 'learning_rate': 1.2862947146977876e-05, 'epoch': 0.43} {'loss': 0.2615, 'learning_rate': 1.2856976558635532e-05, 'epoch': 0.43} {'loss': 0.8022, 'learning_rate': 1.2851004861116519e-05, 'epoch': 0.43} {'loss': 0.7378, 'learning_rate': 1.2845032056739257e-05, 'epoch': 0.43} {'loss': 0.8125, 'learning_rate': 1.2839058147822595e-05, 'epoch': 0.43} {'loss': 0.7871, 'learning_rate': 1.2833083136685803e-05, 'epoch': 0.43} {'loss': 0.7725, 'learning_rate': 1.2827107025648595e-05, 'epoch': 0.43} {'loss': 0.7803, 'learning_rate': 1.2821129817031099e-05, 'epoch': 0.43} {'loss': 0.835, 'learning_rate': 1.2815151513153874e-05, 'epoch': 0.43} {'loss': 0.8721, 'learning_rate': 1.2809172116337903e-05, 'epoch': 0.43} {'loss': 0.8193, 'learning_rate': 1.2803191628904594e-05, 'epoch': 0.43} {'loss': 0.835, 'learning_rate': 1.2797210053175779e-05, 'epoch': 0.43} {'loss': 0.2794, 'learning_rate': 1.2791227391473706e-05, 'epoch': 0.43} {'loss': 0.7915, 'learning_rate': 1.2785243646121059e-05, 'epoch': 0.43} {'loss': 0.8418, 'learning_rate': 1.277925881944093e-05, 'epoch': 0.43} {'loss': 0.7886, 'learning_rate': 1.2773272913756833e-05, 'epoch': 0.43} {'loss': 0.7944, 'learning_rate': 1.2767285931392705e-05, 'epoch': 0.43} {'loss': 0.7681, 'learning_rate': 1.27612978746729e-05, 'epoch': 0.43} {'loss': 0.8145, 'learning_rate': 1.2755308745922182e-05, 'epoch': 0.43} {'loss': 0.7183, 'learning_rate': 1.2749318547465742e-05, 'epoch': 0.43} {'loss': 0.7725, 'learning_rate': 1.2743327281629181e-05, 'epoch': 0.43} [2024-01-31 03:19:16,793] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7954, 'learning_rate': 1.2737334950738512e-05, 'epoch': 0.43} [2024-01-31 03:19:34,067] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8071, 'learning_rate': 1.273134155712017e-05, 'epoch': 0.43} {'loss': 0.7891, 'learning_rate': 1.272534710310099e-05, 'epoch': 0.43} {'loss': 0.7944, 'learning_rate': 1.2719351591008228e-05, 'epoch': 0.43} {'loss': 0.8271, 'learning_rate': 1.2713355023169547e-05, 'epoch': 0.43} [2024-01-31 03:20:44,289] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.856, 'learning_rate': 1.2707357401913022e-05, 'epoch': 0.43} [2024-01-31 03:21:11,961] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8247, 'learning_rate': 1.270135872956714e-05, 'epoch': 0.43} {'loss': 0.8369, 'learning_rate': 1.2695359008460785e-05, 'epoch': 0.43} {'loss': 0.8013, 'learning_rate': 1.2689358240923264e-05, 'epoch': 0.43} {'loss': 0.8086, 'learning_rate': 1.2683356429284273e-05, 'epoch': 0.43} {'loss': 0.8076, 'learning_rate': 1.2677353575873926e-05, 'epoch': 0.43} {'loss': 0.7578, 'learning_rate': 1.2671349683022736e-05, 'epoch': 0.43} {'loss': 0.79, 'learning_rate': 1.2665344753061622e-05, 'epoch': 0.43} {'loss': 0.793, 'learning_rate': 1.2659338788321904e-05, 'epoch': 0.43} {'loss': 0.7812, 'learning_rate': 1.2653331791135308e-05, 'epoch': 0.43} {'loss': 0.7646, 'learning_rate': 1.2647323763833952e-05, 'epoch': 0.43} {'loss': 0.7979, 'learning_rate': 1.264131470875036e-05, 'epoch': 0.43} {'loss': 0.8296, 'learning_rate': 1.2635304628217452e-05, 'epoch': 0.43} {'loss': 0.7495, 'learning_rate': 1.2629293524568555e-05, 'epoch': 0.43} {'loss': 0.8584, 'learning_rate': 1.2623281400137383e-05, 'epoch': 0.43} {'loss': 0.792, 'learning_rate': 1.2617268257258051e-05, 'epoch': 0.43} {'loss': 0.2737, 'learning_rate': 1.2611254098265063e-05, 'epoch': 0.43} {'loss': 0.772, 'learning_rate': 1.2605238925493326e-05, 'epoch': 0.43} {'loss': 0.8301, 'learning_rate': 1.2599222741278136e-05, 'epoch': 0.43} {'loss': 0.8008, 'learning_rate': 1.2593205547955185e-05, 'epoch': 0.43} {'loss': 0.7993, 'learning_rate': 1.2587187347860554e-05, 'epoch': 0.43} {'loss': 0.8584, 'learning_rate': 1.2581168143330716e-05, 'epoch': 0.43} {'loss': 0.7798, 'learning_rate': 1.2575147936702531e-05, 'epoch': 0.43} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1556507488.jpg' {'loss': 0.8438, 'learning_rate': 1.2569126730313255e-05, 'epoch': 0.43} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739714864.jpg' {'loss': 0.8071, 'learning_rate': 1.2563104526500523e-05, 'epoch': 0.43} {'loss': 0.8374, 'learning_rate': 1.2557081327602361e-05, 'epoch': 0.44} {'loss': 0.8257, 'learning_rate': 1.2551057135957187e-05, 'epoch': 0.44} {'loss': 0.264, 'learning_rate': 1.2545031953903796e-05, 'epoch': 0.44} {'loss': 0.8193, 'learning_rate': 1.2539005783781374e-05, 'epoch': 0.44} {'loss': 0.8311, 'learning_rate': 1.2532978627929486e-05, 'epoch': 0.44} {'loss': 0.7593, 'learning_rate': 1.2526950488688083e-05, 'epoch': 0.44} {'loss': 0.8027, 'learning_rate': 1.2520921368397492e-05, 'epoch': 0.44} {'loss': 0.8428, 'learning_rate': 1.2514891269398429e-05, 'epoch': 0.44} {'loss': 0.7812, 'learning_rate': 1.2508860194031986e-05, 'epoch': 0.44} {'loss': 0.8145, 'learning_rate': 1.2502828144639629e-05, 'epoch': 0.44} {'loss': 0.8364, 'learning_rate': 1.2496795123563218e-05, 'epoch': 0.44} {'loss': 0.8267, 'learning_rate': 1.249076113314497e-05, 'epoch': 0.44} {'loss': 0.8521, 'learning_rate': 1.248472617572749e-05, 'epoch': 0.44} {'loss': 0.7637, 'learning_rate': 1.2478690253653756e-05, 'epoch': 0.44} {'loss': 0.7861, 'learning_rate': 1.2472653369267122e-05, 'epoch': 0.44} {'loss': 0.7754, 'learning_rate': 1.2466615524911316e-05, 'epoch': 0.44} {'loss': 0.7866, 'learning_rate': 1.2460576722930432e-05, 'epoch': 0.44} {'loss': 0.7769, 'learning_rate': 1.2454536965668949e-05, 'epoch': 0.44} {'loss': 0.834, 'learning_rate': 1.24484962554717e-05, 'epoch': 0.44} {'loss': 0.8169, 'learning_rate': 1.24424545946839e-05, 'epoch': 0.44} {'loss': 0.8164, 'learning_rate': 1.2436411985651131e-05, 'epoch': 0.44} {'loss': 0.8623, 'learning_rate': 1.2430368430719342e-05, 'epoch': 0.44} [2024-01-31 03:35:21,755] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8257, 'learning_rate': 1.242432393223485e-05, 'epoch': 0.44} [2024-01-31 03:35:38,775] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7627, 'learning_rate': 1.2418278492544328e-05, 'epoch': 0.44} {'loss': 0.7993, 'learning_rate': 1.2412232113994841e-05, 'epoch': 0.44} {'loss': 0.8711, 'learning_rate': 1.2406184798933786e-05, 'epoch': 0.44} {'loss': 0.8003, 'learning_rate': 1.2400136549708945e-05, 'epoch': 0.44} {'loss': 0.8223, 'learning_rate': 1.239408736866846e-05, 'epoch': 0.44} {'loss': 0.8042, 'learning_rate': 1.2388037258160823e-05, 'epoch': 0.44} {'loss': 0.8076, 'learning_rate': 1.23819862205349e-05, 'epoch': 0.44} {'loss': 0.8315, 'learning_rate': 1.2375934258139917e-05, 'epoch': 0.44} {'loss': 0.771, 'learning_rate': 1.2369881373325448e-05, 'epoch': 0.44} {'loss': 0.8354, 'learning_rate': 1.236382756844143e-05, 'epoch': 0.44} {'loss': 0.7964, 'learning_rate': 1.2357772845838159e-05, 'epoch': 0.44} {'loss': 0.7437, 'learning_rate': 1.2351717207866292e-05, 'epoch': 0.44} {'loss': 0.8086, 'learning_rate': 1.2345660656876832e-05, 'epoch': 0.44} {'loss': 0.7681, 'learning_rate': 1.233960319522114e-05, 'epoch': 0.44} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/810943794.jpg' {'loss': 0.269, 'learning_rate': 1.2333544825250938e-05, 'epoch': 0.44} {'loss': 0.8589, 'learning_rate': 1.2327485549318285e-05, 'epoch': 0.44} {'loss': 0.8164, 'learning_rate': 1.2321425369775601e-05, 'epoch': 0.44} {'loss': 0.8511, 'learning_rate': 1.2315364288975665e-05, 'epoch': 0.44} {'loss': 0.8491, 'learning_rate': 1.2309302309271587e-05, 'epoch': 0.44} {'loss': 0.8633, 'learning_rate': 1.2303239433016842e-05, 'epoch': 0.44} {'loss': 0.7734, 'learning_rate': 1.2297175662565248e-05, 'epoch': 0.44} {'loss': 0.8315, 'learning_rate': 1.229111100027097e-05, 'epoch': 0.44} {'loss': 0.8164, 'learning_rate': 1.228504544848851e-05, 'epoch': 0.44} {'loss': 0.8223, 'learning_rate': 1.2278979009572736e-05, 'epoch': 0.44} {'loss': 0.7778, 'learning_rate': 1.2272911685878841e-05, 'epoch': 0.44} {'loss': 0.7915, 'learning_rate': 1.2266843479762372e-05, 'epoch': 0.44} {'loss': 0.7827, 'learning_rate': 1.2260774393579209e-05, 'epoch': 0.44} {'loss': 0.752, 'learning_rate': 1.2254704429685593e-05, 'epoch': 0.44} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1560443952.jpg' {'loss': 0.7891, 'learning_rate': 1.2248633590438084e-05, 'epoch': 0.44} {'loss': 0.8291, 'learning_rate': 1.2242561878193589e-05, 'epoch': 0.45} {'loss': 0.8047, 'learning_rate': 1.2236489295309362e-05, 'epoch': 0.45} {'loss': 0.2905, 'learning_rate': 1.2230415844142984e-05, 'epoch': 0.45} {'loss': 0.8169, 'learning_rate': 1.2224341527052378e-05, 'epoch': 0.45} {'loss': 0.8247, 'learning_rate': 1.2218266346395811e-05, 'epoch': 0.45} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/966542606.jpg' {'loss': 0.8379, 'learning_rate': 1.221219030453187e-05, 'epoch': 0.45} {'loss': 0.8125, 'learning_rate': 1.220611340381948e-05, 'epoch': 0.45} {'loss': 0.8306, 'learning_rate': 1.2200035646617912e-05, 'epoch': 0.45} {'loss': 0.8125, 'learning_rate': 1.2193957035286757e-05, 'epoch': 0.45} {'loss': 0.7778, 'learning_rate': 1.2187877572185937e-05, 'epoch': 0.45} {'loss': 0.7974, 'learning_rate': 1.2181797259675713e-05, 'epoch': 0.45} {'loss': 0.8711, 'learning_rate': 1.2175716100116677e-05, 'epoch': 0.45} {'loss': 0.8271, 'learning_rate': 1.2169634095869736e-05, 'epoch': 0.45} {'loss': 0.8472, 'learning_rate': 1.2163551249296132e-05, 'epoch': 0.45} {'loss': 0.7505, 'learning_rate': 1.2157467562757443e-05, 'epoch': 0.45} {'loss': 0.751, 'learning_rate': 1.2151383038615563e-05, 'epoch': 0.45} {'loss': 0.7896, 'learning_rate': 1.214529767923271e-05, 'epoch': 0.45} {'loss': 0.7788, 'learning_rate': 1.2139211486971436e-05, 'epoch': 0.45} {'loss': 0.8398, 'learning_rate': 1.213312446419461e-05, 'epoch': 0.45} {'loss': 0.7646, 'learning_rate': 1.2127036613265418e-05, 'epoch': 0.45} {'loss': 0.7886, 'learning_rate': 1.2120947936547375e-05, 'epoch': 0.45} {'loss': 0.7856, 'learning_rate': 1.2114858436404322e-05, 'epoch': 0.45} {'loss': 0.7817, 'learning_rate': 1.2108768115200405e-05, 'epoch': 0.45} {'loss': 0.2507, 'learning_rate': 1.2102676975300095e-05, 'epoch': 0.45} {'loss': 0.769, 'learning_rate': 1.209658501906819e-05, 'epoch': 0.45} {'loss': 0.7085, 'learning_rate': 1.2090492248869795e-05, 'epoch': 0.45} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/671025627.jpg' {'loss': 0.7998, 'learning_rate': 1.2084398667070325e-05, 'epoch': 0.45} {'loss': 0.7617, 'learning_rate': 1.2078304276035527e-05, 'epoch': 0.45} {'loss': 0.8457, 'learning_rate': 1.2072209078131451e-05, 'epoch': 0.45} {'loss': 0.2582, 'learning_rate': 1.2066113075724461e-05, 'epoch': 0.45} {'loss': 0.8237, 'learning_rate': 1.206001627118124e-05, 'epoch': 0.45} {'loss': 0.8433, 'learning_rate': 1.2053918666868776e-05, 'epoch': 0.45} {'loss': 0.8105, 'learning_rate': 1.2047820265154362e-05, 'epoch': 0.45} {'loss': 0.8281, 'learning_rate': 1.2041721068405614e-05, 'epoch': 0.45} {'loss': 0.8398, 'learning_rate': 1.203562107899045e-05, 'epoch': 0.45} {'loss': 0.7568, 'learning_rate': 1.2029520299277095e-05, 'epoch': 0.45} {'loss': 0.8364, 'learning_rate': 1.2023418731634078e-05, 'epoch': 0.45} {'loss': 0.7773, 'learning_rate': 1.2017316378430244e-05, 'epoch': 0.45} {'loss': 0.8115, 'learning_rate': 1.2011213242034733e-05, 'epoch': 0.45} {'loss': 0.7925, 'learning_rate': 1.2005109324816992e-05, 'epoch': 0.45} {'loss': 0.8096, 'learning_rate': 1.1999004629146775e-05, 'epoch': 0.45} {'loss': 0.7705, 'learning_rate': 1.1992899157394133e-05, 'epoch': 0.45} {'loss': 0.8281, 'learning_rate': 1.1986792911929418e-05, 'epoch': 0.45} {'loss': 0.8032, 'learning_rate': 1.198068589512329e-05, 'epoch': 0.45} {'loss': 0.8037, 'learning_rate': 1.1974578109346702e-05, 'epoch': 0.45} {'loss': 0.7446, 'learning_rate': 1.1968469556970905e-05, 'epoch': 0.45} {'loss': 0.8135, 'learning_rate': 1.1962360240367445e-05, 'epoch': 0.45} {'loss': 0.8081, 'learning_rate': 1.1956250161908179e-05, 'epoch': 0.45} {'loss': 0.8491, 'learning_rate': 1.195013932396524e-05, 'epoch': 0.45} {'loss': 0.8584, 'learning_rate': 1.1944027728911072e-05, 'epoch': 0.45} {'loss': 0.2844, 'learning_rate': 1.1937915379118406e-05, 'epoch': 0.45} {'loss': 0.77, 'learning_rate': 1.1931802276960265e-05, 'epoch': 0.45} {'loss': 0.8203, 'learning_rate': 1.1925688424809965e-05, 'epoch': 0.46} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/966424603.jpg' {'loss': 0.812, 'learning_rate': 1.1919573825041115e-05, 'epoch': 0.46} {'loss': 0.7407, 'learning_rate': 1.1913458480027614e-05, 'epoch': 0.46} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/312244452.jpg' {'loss': 0.7725, 'learning_rate': 1.1907342392143646e-05, 'epoch': 0.46} {'loss': 0.7969, 'learning_rate': 1.1901225563763694e-05, 'epoch': 0.46} {'loss': 0.791, 'learning_rate': 1.1895107997262516e-05, 'epoch': 0.46} {'loss': 0.7593, 'learning_rate': 1.1888989695015166e-05, 'epoch': 0.46} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/912818034.jpg' {'loss': 0.7642, 'learning_rate': 1.1882870659396968e-05, 'epoch': 0.46} {'loss': 0.7534, 'learning_rate': 1.1876750892783558e-05, 'epoch': 0.46} {'loss': 0.8267, 'learning_rate': 1.1870630397550831e-05, 'epoch': 0.46} {'loss': 0.769, 'learning_rate': 1.1864509176074974e-05, 'epoch': 0.46} {'loss': 0.7793, 'learning_rate': 1.185838723073246e-05, 'epoch': 0.46} {'loss': 0.7964, 'learning_rate': 1.1852264563900038e-05, 'epoch': 0.46} {'loss': 0.2803, 'learning_rate': 1.1846141177954733e-05, 'epoch': 0.46} {'loss': 0.8311, 'learning_rate': 1.1840017075273861e-05, 'epoch': 0.46} {'loss': 0.7876, 'learning_rate': 1.1833892258235008e-05, 'epoch': 0.46} {'loss': 0.7974, 'learning_rate': 1.1827766729216035e-05, 'epoch': 0.46} {'loss': 0.8301, 'learning_rate': 1.1821640490595086e-05, 'epoch': 0.46} {'loss': 0.2587, 'learning_rate': 1.181551354475058e-05, 'epoch': 0.46} {'loss': 0.7881, 'learning_rate': 1.1809385894061206e-05, 'epoch': 0.46} {'loss': 0.8105, 'learning_rate': 1.1803257540905926e-05, 'epoch': 0.46} {'loss': 0.7852, 'learning_rate': 1.1797128487663982e-05, 'epoch': 0.46} {'loss': 0.7495, 'learning_rate': 1.1790998736714882e-05, 'epoch': 0.46} {'loss': 0.8008, 'learning_rate': 1.1784868290438404e-05, 'epoch': 0.46} {'loss': 0.7725, 'learning_rate': 1.1778737151214606e-05, 'epoch': 0.46} {'loss': 0.8091, 'learning_rate': 1.17726053214238e-05, 'epoch': 0.46} {'loss': 0.7603, 'learning_rate': 1.1766472803446577e-05, 'epoch': 0.46} {'loss': 0.8066, 'learning_rate': 1.1760339599663788e-05, 'epoch': 0.46} {'loss': 0.7778, 'learning_rate': 1.1754205712456556e-05, 'epoch': 0.46} [2024-01-31 04:09:34,326] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7666, 'learning_rate': 1.1748071144206266e-05, 'epoch': 0.46} [2024-01-31 04:09:52,710] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7744, 'learning_rate': 1.1741935897294572e-05, 'epoch': 0.46} {'loss': 0.2593, 'learning_rate': 1.1735799974103388e-05, 'epoch': 0.46} [2024-01-31 04:10:28,780] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8301, 'learning_rate': 1.1729663377014888e-05, 'epoch': 0.46} {'loss': 0.7734, 'learning_rate': 1.172352610841151e-05, 'epoch': 0.46} [2024-01-31 04:11:04,780] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.813, 'learning_rate': 1.1717388170675954e-05, 'epoch': 0.46} [2024-01-31 04:11:21,984] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8057, 'learning_rate': 1.1711249566191179e-05, 'epoch': 0.46} {'loss': 0.7632, 'learning_rate': 1.17051102973404e-05, 'epoch': 0.46} {'loss': 0.8472, 'learning_rate': 1.1698970366507096e-05, 'epoch': 0.46} {'loss': 0.7534, 'learning_rate': 1.1692829776074999e-05, 'epoch': 0.46} {'loss': 0.7817, 'learning_rate': 1.1686688528428099e-05, 'epoch': 0.46} {'loss': 0.8105, 'learning_rate': 1.1680546625950635e-05, 'epoch': 0.46} {'loss': 0.7993, 'learning_rate': 1.167440407102711e-05, 'epoch': 0.46} {'loss': 0.8237, 'learning_rate': 1.1668260866042271e-05, 'epoch': 0.46} {'loss': 0.7769, 'learning_rate': 1.1662117013381126e-05, 'epoch': 0.46} {'loss': 0.8486, 'learning_rate': 1.1655972515428928e-05, 'epoch': 0.46} {'loss': 0.7554, 'learning_rate': 1.1649827374571182e-05, 'epoch': 0.46} {'loss': 0.2644, 'learning_rate': 1.1643681593193642e-05, 'epoch': 0.46} [2024-01-31 04:15:13,871] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8003, 'learning_rate': 1.1637535173682318e-05, 'epoch': 0.46} [2024-01-31 04:15:38,130] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.814, 'learning_rate': 1.1631388118423457e-05, 'epoch': 0.46} {'loss': 0.7415, 'learning_rate': 1.1625240429803553e-05, 'epoch': 0.46} {'loss': 0.8042, 'learning_rate': 1.1619092110209361e-05, 'epoch': 0.46} {'loss': 0.8467, 'learning_rate': 1.1612943162027863e-05, 'epoch': 0.46} {'loss': 0.7925, 'learning_rate': 1.1606793587646295e-05, 'epoch': 0.47} {'loss': 0.7769, 'learning_rate': 1.160064338945213e-05, 'epoch': 0.47} {'loss': 0.813, 'learning_rate': 1.1594492569833093e-05, 'epoch': 0.47} {'loss': 0.8105, 'learning_rate': 1.1588341131177137e-05, 'epoch': 0.47} {'loss': 0.8062, 'learning_rate': 1.1582189075872467e-05, 'epoch': 0.47} {'loss': 0.7827, 'learning_rate': 1.1576036406307523e-05, 'epoch': 0.47} {'loss': 0.7715, 'learning_rate': 1.156988312487098e-05, 'epoch': 0.47} {'loss': 0.8213, 'learning_rate': 1.1563729233951757e-05, 'epoch': 0.47} {'loss': 0.8037, 'learning_rate': 1.1557574735939003e-05, 'epoch': 0.47} {'loss': 0.7676, 'learning_rate': 1.1551419633222107e-05, 'epoch': 0.47} {'loss': 0.8262, 'learning_rate': 1.1545263928190692e-05, 'epoch': 0.47} {'loss': 0.8311, 'learning_rate': 1.1539107623234618e-05, 'epoch': 0.47} {'loss': 0.2709, 'learning_rate': 1.153295072074397e-05, 'epoch': 0.47} {'loss': 0.8032, 'learning_rate': 1.1526793223109072e-05, 'epoch': 0.47} {'loss': 0.8057, 'learning_rate': 1.1520635132720475e-05, 'epoch': 0.47} {'loss': 0.8433, 'learning_rate': 1.1514476451968961e-05, 'epoch': 0.47} {'loss': 0.7754, 'learning_rate': 1.1508317183245545e-05, 'epoch': 0.47} {'loss': 0.8452, 'learning_rate': 1.1502157328941466e-05, 'epoch': 0.47} {'loss': 0.7759, 'learning_rate': 1.149599689144819e-05, 'epoch': 0.47} {'loss': 0.7979, 'learning_rate': 1.1489835873157414e-05, 'epoch': 0.47} {'loss': 0.7837, 'learning_rate': 1.1483674276461053e-05, 'epoch': 0.47} {'loss': 0.7729, 'learning_rate': 1.1477512103751254e-05, 'epoch': 0.47} {'loss': 0.791, 'learning_rate': 1.1471349357420384e-05, 'epoch': 0.47} {'loss': 0.2622, 'learning_rate': 1.1465186039861033e-05, 'epoch': 0.47} {'loss': 0.793, 'learning_rate': 1.1459022153466016e-05, 'epoch': 0.47} {'loss': 0.7817, 'learning_rate': 1.1452857700628362e-05, 'epoch': 0.47} {'loss': 0.7476, 'learning_rate': 1.1446692683741326e-05, 'epoch': 0.47} {'loss': 0.8213, 'learning_rate': 1.1440527105198377e-05, 'epoch': 0.47} {'loss': 0.8247, 'learning_rate': 1.143436096739321e-05, 'epoch': 0.47} {'loss': 0.7759, 'learning_rate': 1.1428194272719729e-05, 'epoch': 0.47} {'loss': 0.3066, 'learning_rate': 1.1422027023572052e-05, 'epoch': 0.47} {'loss': 0.8174, 'learning_rate': 1.1415859222344525e-05, 'epoch': 0.47} {'loss': 0.8247, 'learning_rate': 1.14096908714317e-05, 'epoch': 0.47} {'loss': 0.8218, 'learning_rate': 1.1403521973228342e-05, 'epoch': 0.47} {'loss': 0.7495, 'learning_rate': 1.1397352530129428e-05, 'epoch': 0.47} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1580910068.jpg' {'loss': 0.8574, 'learning_rate': 1.139118254453015e-05, 'epoch': 0.47} {'loss': 0.7422, 'learning_rate': 1.1385012018825907e-05, 'epoch': 0.47} {'loss': 0.77, 'learning_rate': 1.1378840955412313e-05, 'epoch': 0.47} {'loss': 0.791, 'learning_rate': 1.1372669356685185e-05, 'epoch': 0.47} {'loss': 0.8486, 'learning_rate': 1.1366497225040549e-05, 'epoch': 0.47} {'loss': 0.7627, 'learning_rate': 1.1360324562874643e-05, 'epoch': 0.47} {'loss': 0.8252, 'learning_rate': 1.1354151372583901e-05, 'epoch': 0.47} {'loss': 0.2456, 'learning_rate': 1.1347977656564974e-05, 'epoch': 0.47} {'loss': 0.8223, 'learning_rate': 1.1341803417214705e-05, 'epoch': 0.47} [2024-01-31 04:30:20,018] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7734, 'learning_rate': 1.1335628656930153e-05, 'epoch': 0.47} [2024-01-31 04:30:37,123] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7627, 'learning_rate': 1.132945337810857e-05, 'epoch': 0.47} {'loss': 0.7397, 'learning_rate': 1.132327758314741e-05, 'epoch': 0.47} {'loss': 0.8013, 'learning_rate': 1.131710127444433e-05, 'epoch': 0.47} {'loss': 0.7646, 'learning_rate': 1.1310924454397187e-05, 'epoch': 0.47} {'loss': 0.8413, 'learning_rate': 1.1304747125404031e-05, 'epoch': 0.47} {'loss': 0.7847, 'learning_rate': 1.129856928986312e-05, 'epoch': 0.47} {'loss': 0.77, 'learning_rate': 1.12923909501729e-05, 'epoch': 0.47} {'loss': 0.7993, 'learning_rate': 1.1286212108732015e-05, 'epoch': 0.48} {'loss': 0.8164, 'learning_rate': 1.1280032767939302e-05, 'epoch': 0.48} {'loss': 0.7935, 'learning_rate': 1.1273852930193798e-05, 'epoch': 0.48} {'loss': 0.855, 'learning_rate': 1.1267672597894725e-05, 'epoch': 0.48} {'loss': 0.8242, 'learning_rate': 1.12614917734415e-05, 'epoch': 0.48} {'loss': 0.7588, 'learning_rate': 1.1255310459233737e-05, 'epoch': 0.48} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/299130002.jpg' {'loss': 0.8091, 'learning_rate': 1.1249128657671233e-05, 'epoch': 0.48} {'loss': 0.7886, 'learning_rate': 1.1242946371153974e-05, 'epoch': 0.48} {'loss': 0.7739, 'learning_rate': 1.1236763602082136e-05, 'epoch': 0.48} {'loss': 0.793, 'learning_rate': 1.1230580352856088e-05, 'epoch': 0.48} {'loss': 0.7573, 'learning_rate': 1.1224396625876375e-05, 'epoch': 0.48} {'loss': 0.7534, 'learning_rate': 1.1218212423543734e-05, 'epoch': 0.48} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/893148423.jpg' {'loss': 0.8218, 'learning_rate': 1.1212027748259086e-05, 'epoch': 0.48} {'loss': 0.7358, 'learning_rate': 1.1205842602423537e-05, 'epoch': 0.48} {'loss': 0.7061, 'learning_rate': 1.1199656988438373e-05, 'epoch': 0.48} {'loss': 0.7324, 'learning_rate': 1.1193470908705055e-05, 'epoch': 0.48} {'loss': 0.8647, 'learning_rate': 1.1187284365625241e-05, 'epoch': 0.48} {'loss': 0.7537, 'learning_rate': 1.1181097361600754e-05, 'epoch': 0.48} {'loss': 0.8154, 'learning_rate': 1.1174909899033608e-05, 'epoch': 0.48} {'loss': 0.8491, 'learning_rate': 1.1168721980325987e-05, 'epoch': 0.48} {'loss': 0.7715, 'learning_rate': 1.1162533607880251e-05, 'epoch': 0.48} {'loss': 0.7876, 'learning_rate': 1.1156344784098942e-05, 'epoch': 0.48} {'loss': 0.8096, 'learning_rate': 1.1150155511384772e-05, 'epoch': 0.48} {'loss': 0.7729, 'learning_rate': 1.1143965792140631e-05, 'epoch': 0.48} {'loss': 0.7949, 'learning_rate': 1.1137775628769584e-05, 'epoch': 0.48} {'loss': 0.7974, 'learning_rate': 1.1131585023674863e-05, 'epoch': 0.48} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/051763547X.jpg' {'loss': 0.8325, 'learning_rate': 1.1125393979259874e-05, 'epoch': 0.48} {'loss': 0.7935, 'learning_rate': 1.1119202497928192e-05, 'epoch': 0.48} {'loss': 0.7637, 'learning_rate': 1.1113010582083568e-05, 'epoch': 0.48} {'loss': 0.8525, 'learning_rate': 1.1106818234129913e-05, 'epoch': 0.48} {'loss': 0.77, 'learning_rate': 1.1100625456471307e-05, 'epoch': 0.48} {'loss': 0.813, 'learning_rate': 1.1094432251512006e-05, 'epoch': 0.48} {'loss': 0.7705, 'learning_rate': 1.1088238621656422e-05, 'epoch': 0.48} {'loss': 0.8071, 'learning_rate': 1.1082044569309138e-05, 'epoch': 0.48} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/785270965.jpg' {'loss': 0.7144, 'learning_rate': 1.1075850096874894e-05, 'epoch': 0.48} {'loss': 0.7891, 'learning_rate': 1.1069655206758603e-05, 'epoch': 0.48} {'loss': 0.791, 'learning_rate': 1.1063459901365325e-05, 'epoch': 0.48} {'loss': 0.2852, 'learning_rate': 1.1057264183100303e-05, 'epoch': 0.48} {'loss': 0.8022, 'learning_rate': 1.1051068054368921e-05, 'epoch': 0.48} {'loss': 0.7485, 'learning_rate': 1.104487151757673e-05, 'epoch': 0.48} {'loss': 0.855, 'learning_rate': 1.1038674575129442e-05, 'epoch': 0.48} {'loss': 0.8086, 'learning_rate': 1.1032477229432921e-05, 'epoch': 0.48} {'loss': 0.8618, 'learning_rate': 1.1026279482893187e-05, 'epoch': 0.48} [2024-01-31 04:45:59,573] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.834, 'learning_rate': 1.1020081337916425e-05, 'epoch': 0.48} [2024-01-31 04:46:17,485] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8203, 'learning_rate': 1.1013882796908963e-05, 'epoch': 0.48} {'loss': 0.7988, 'learning_rate': 1.1007683862277292e-05, 'epoch': 0.48} {'loss': 0.8193, 'learning_rate': 1.1001484536428052e-05, 'epoch': 0.48} {'loss': 0.791, 'learning_rate': 1.0995284821768029e-05, 'epoch': 0.48} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/078710339X.jpg' {'loss': 0.7485, 'learning_rate': 1.098908472070417e-05, 'epoch': 0.48} {'loss': 0.8037, 'learning_rate': 1.0982884235643567e-05, 'epoch': 0.48} {'loss': 0.8594, 'learning_rate': 1.0976683368993464e-05, 'epoch': 0.48} {'loss': 0.8081, 'learning_rate': 1.0970482123161249e-05, 'epoch': 0.48} {'loss': 0.6821, 'learning_rate': 1.0964280500554459e-05, 'epoch': 0.49} {'loss': 0.8096, 'learning_rate': 1.0958078503580776e-05, 'epoch': 0.49} {'loss': 0.7024, 'learning_rate': 1.0951876134648032e-05, 'epoch': 0.49} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/930410629.jpg' {'loss': 0.7935, 'learning_rate': 1.0945673396164198e-05, 'epoch': 0.49} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/664220789.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/157851231X.jpg' {'loss': 0.771, 'learning_rate': 1.0939470290537389e-05, 'epoch': 0.49} {'loss': 0.8247, 'learning_rate': 1.0933266820175868e-05, 'epoch': 0.49} {'loss': 0.7998, 'learning_rate': 1.0927062987488035e-05, 'epoch': 0.49} {'loss': 0.8438, 'learning_rate': 1.0920858794882429e-05, 'epoch': 0.49} {'loss': 0.8311, 'learning_rate': 1.0914654244767736e-05, 'epoch': 0.49} {'loss': 0.7832, 'learning_rate': 1.0908449339552769e-05, 'epoch': 0.49} {'loss': 0.8184, 'learning_rate': 1.0902244081646489e-05, 'epoch': 0.49} {'loss': 0.8037, 'learning_rate': 1.0896038473457993e-05, 'epoch': 0.49} {'loss': 0.8208, 'learning_rate': 1.0889832517396511e-05, 'epoch': 0.49} {'loss': 0.8086, 'learning_rate': 1.0883626215871408e-05, 'epoch': 0.49} {'loss': 0.8359, 'learning_rate': 1.0877419571292183e-05, 'epoch': 0.49} {'loss': 0.7617, 'learning_rate': 1.0871212586068469e-05, 'epoch': 0.49} {'loss': 0.7827, 'learning_rate': 1.0865005262610033e-05, 'epoch': 0.49} {'loss': 0.8242, 'learning_rate': 1.085879760332677e-05, 'epoch': 0.49} {'loss': 0.7671, 'learning_rate': 1.085258961062871e-05, 'epoch': 0.49} {'loss': 0.7705, 'learning_rate': 1.0846381286926007e-05, 'epoch': 0.49} {'loss': 0.8394, 'learning_rate': 1.0840172634628948e-05, 'epoch': 0.49} {'loss': 0.7935, 'learning_rate': 1.0833963656147944e-05, 'epoch': 0.49} {'loss': 0.8042, 'learning_rate': 1.082775435389353e-05, 'epoch': 0.49} {'loss': 0.835, 'learning_rate': 1.0821544730276379e-05, 'epoch': 0.49} {'loss': 0.7979, 'learning_rate': 1.0815334787707277e-05, 'epoch': 0.49} {'loss': 0.8174, 'learning_rate': 1.0809124528597138e-05, 'epoch': 0.49} {'loss': 0.7915, 'learning_rate': 1.0802913955356998e-05, 'epoch': 0.49} {'loss': 0.813, 'learning_rate': 1.0796703070398016e-05, 'epoch': 0.49} {'loss': 0.7524, 'learning_rate': 1.079049187613147e-05, 'epoch': 0.49} {'loss': 0.7979, 'learning_rate': 1.0784280374968761e-05, 'epoch': 0.49} {'loss': 0.8086, 'learning_rate': 1.0778068569321403e-05, 'epoch': 0.49} {'loss': 0.8218, 'learning_rate': 1.077185646160104e-05, 'epoch': 0.49} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/894550225.jpg' {'loss': 0.7209, 'learning_rate': 1.0765644054219422e-05, 'epoch': 0.49} {'loss': 0.8301, 'learning_rate': 1.0759431349588421e-05, 'epoch': 0.49} {'loss': 0.813, 'learning_rate': 1.0753218350120023e-05, 'epoch': 0.49} {'loss': 0.8062, 'learning_rate': 1.0747005058226325e-05, 'epoch': 0.49} {'loss': 0.2723, 'learning_rate': 1.0740791476319543e-05, 'epoch': 0.49} {'loss': 0.8164, 'learning_rate': 1.0734577606812007e-05, 'epoch': 0.49} {'loss': 0.7778, 'learning_rate': 1.0728363452116149e-05, 'epoch': 0.49} {'loss': 0.812, 'learning_rate': 1.0722149014644523e-05, 'epoch': 0.49} {'loss': 0.7725, 'learning_rate': 1.0715934296809782e-05, 'epoch': 0.49} {'loss': 0.7974, 'learning_rate': 1.0709719301024698e-05, 'epoch': 0.49} {'loss': 0.7261, 'learning_rate': 1.0703504029702148e-05, 'epoch': 0.49} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/761500413.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/155583468X.jpg' {'loss': 0.79, 'learning_rate': 1.0697288485255107e-05, 'epoch': 0.49} {'loss': 0.7573, 'learning_rate': 1.0691072670096669e-05, 'epoch': 0.49} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/668053984.jpg' [2024-01-31 05:02:34,263] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7847, 'learning_rate': 1.0684856586640026e-05, 'epoch': 0.49} [2024-01-31 05:02:52,977] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.853, 'learning_rate': 1.0678640237298476e-05, 'epoch': 0.49} [2024-01-31 05:03:11,539] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.793, 'learning_rate': 1.0672423624485423e-05, 'epoch': 0.49} [2024-01-31 05:03:29,476] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8491, 'learning_rate': 1.0666206750614363e-05, 'epoch': 0.49} {'loss': 0.8042, 'learning_rate': 1.0659989618098904e-05, 'epoch': 0.49} {'loss': 0.7983, 'learning_rate': 1.065377222935275e-05, 'epoch': 0.49} {'loss': 0.7905, 'learning_rate': 1.0647554586789708e-05, 'epoch': 0.49} {'loss': 0.7827, 'learning_rate': 1.064133669282368e-05, 'epoch': 0.5} {'loss': 0.748, 'learning_rate': 1.0635118549868668e-05, 'epoch': 0.5} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/811726819.jpg' {'loss': 0.7571, 'learning_rate': 1.0628900160338764e-05, 'epoch': 0.5} {'loss': 0.7476, 'learning_rate': 1.0622681526648167e-05, 'epoch': 0.5} {'loss': 0.854, 'learning_rate': 1.0616462651211156e-05, 'epoch': 0.5} {'loss': 0.7935, 'learning_rate': 1.0610243536442125e-05, 'epoch': 0.5} {'loss': 0.7812, 'learning_rate': 1.0604024184755539e-05, 'epoch': 0.5} {'loss': 0.7388, 'learning_rate': 1.0597804598565969e-05, 'epoch': 0.5} {'loss': 0.7568, 'learning_rate': 1.0591584780288069e-05, 'epoch': 0.5} {'loss': 0.7798, 'learning_rate': 1.0585364732336587e-05, 'epoch': 0.5} {'loss': 0.7598, 'learning_rate': 1.0579144457126365e-05, 'epoch': 0.5} {'loss': 0.8188, 'learning_rate': 1.057292395707232e-05, 'epoch': 0.5} {'loss': 0.8164, 'learning_rate': 1.0566703234589471e-05, 'epoch': 0.5} {'loss': 0.7881, 'learning_rate': 1.0560482292092912e-05, 'epoch': 0.5} {'loss': 0.7725, 'learning_rate': 1.0554261131997833e-05, 'epoch': 0.5} {'loss': 0.8032, 'learning_rate': 1.0548039756719497e-05, 'epoch': 0.5} {'loss': 0.7529, 'learning_rate': 1.054181816867326e-05, 'epoch': 0.5} {'loss': 0.7432, 'learning_rate': 1.053559637027455e-05, 'epoch': 0.5} {'loss': 0.7905, 'learning_rate': 1.0529374363938888e-05, 'epoch': 0.5} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/930016238.jpg' {'loss': 0.8623, 'learning_rate': 1.0523152152081875e-05, 'epoch': 0.5} {'loss': 0.7534, 'learning_rate': 1.051692973711918e-05, 'epoch': 0.5} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/2067009559.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/28628594.jpg' {'loss': 0.8237, 'learning_rate': 1.0510707121466568e-05, 'epoch': 0.5} {'loss': 0.834, 'learning_rate': 1.0504484307539864e-05, 'epoch': 0.5} {'loss': 0.8442, 'learning_rate': 1.0498261297754984e-05, 'epoch': 0.5} {'loss': 0.7842, 'learning_rate': 1.0492038094527907e-05, 'epoch': 0.5} {'loss': 0.772, 'learning_rate': 1.0485814700274706e-05, 'epoch': 0.5} {'loss': 0.751, 'learning_rate': 1.047959111741151e-05, 'epoch': 0.5} {'loss': 0.8018, 'learning_rate': 1.0473367348354529e-05, 'epoch': 0.5} {'loss': 0.8125, 'learning_rate': 1.0467143395520044e-05, 'epoch': 0.5} {'loss': 0.2682, 'learning_rate': 1.046091926132441e-05, 'epoch': 0.5} {'loss': 0.8413, 'learning_rate': 1.0454694948184045e-05, 'epoch': 0.5} {'loss': 0.7803, 'learning_rate': 1.044847045851545e-05, 'epoch': 0.5} {'loss': 0.7734, 'learning_rate': 1.044224579473518e-05, 'epoch': 0.5} {'loss': 0.8145, 'learning_rate': 1.0436020959259862e-05, 'epoch': 0.5} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/789404427.jpg' {'loss': 0.811, 'learning_rate': 1.0429795954506203e-05, 'epoch': 0.5} {'loss': 0.7944, 'learning_rate': 1.0423570782890951e-05, 'epoch': 0.5} {'loss': 0.8657, 'learning_rate': 1.0417345446830938e-05, 'epoch': 0.5} {'loss': 0.7063, 'learning_rate': 1.0411119948743052e-05, 'epoch': 0.5} {'loss': 0.7725, 'learning_rate': 1.0404894291044247e-05, 'epoch': 0.5} {'loss': 0.7812, 'learning_rate': 1.0398668476151538e-05, 'epoch': 0.5} {'loss': 0.7402, 'learning_rate': 1.0392442506482e-05, 'epoch': 0.5} {'loss': 0.7886, 'learning_rate': 1.038621638445277e-05, 'epoch': 0.5} {'loss': 0.2611, 'learning_rate': 1.037999011248104e-05, 'epoch': 0.5} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/914625179.jpg' {'loss': 0.811, 'learning_rate': 1.0373763692984062e-05, 'epoch': 0.5} {'loss': 0.7686, 'learning_rate': 1.0367537128379154e-05, 'epoch': 0.5} {'loss': 0.8462, 'learning_rate': 1.0361310421083677e-05, 'epoch': 0.5} {'loss': 0.8613, 'learning_rate': 1.0355083573515052e-05, 'epoch': 0.5} {'loss': 0.7422, 'learning_rate': 1.0348856588090764e-05, 'epoch': 0.5} {'loss': 0.7354, 'learning_rate': 1.0342629467228331e-05, 'epoch': 0.5} {'loss': 0.7764, 'learning_rate': 1.0336402213345345e-05, 'epoch': 0.5} {'loss': 0.8184, 'learning_rate': 1.0330174828859434e-05, 'epoch': 0.5} {'loss': 0.7964, 'learning_rate': 1.0323947316188288e-05, 'epoch': 0.51} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1574770225.jpg' {'loss': 0.792, 'learning_rate': 1.031771967774964e-05, 'epoch': 0.51} {'loss': 0.8008, 'learning_rate': 1.0311491915961271e-05, 'epoch': 0.51} {'loss': 0.2584, 'learning_rate': 1.030526403324102e-05, 'epoch': 0.51} {'loss': 0.7593, 'learning_rate': 1.0299036032006759e-05, 'epoch': 0.51} {'loss': 0.7729, 'learning_rate': 1.0292807914676412e-05, 'epoch': 0.51} {'loss': 0.2588, 'learning_rate': 1.0286579683667952e-05, 'epoch': 0.51} [2024-01-31 05:22:27,529] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7583, 'learning_rate': 1.0280351341399392e-05, 'epoch': 0.51} {'loss': 0.835, 'learning_rate': 1.027412289028879e-05, 'epoch': 0.51} {'loss': 0.7856, 'learning_rate': 1.0267894332754243e-05, 'epoch': 0.51} {'loss': 0.7554, 'learning_rate': 1.0261665671213891e-05, 'epoch': 0.51} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1565540581.jpg' {'loss': 0.7759, 'learning_rate': 1.0255436908085919e-05, 'epoch': 0.51} {'loss': 0.7485, 'learning_rate': 1.024920804578854e-05, 'epoch': 0.51} {'loss': 0.8413, 'learning_rate': 1.0242979086740019e-05, 'epoch': 0.51} [2024-01-31 05:24:40,866] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7842, 'learning_rate': 1.023675003335865e-05, 'epoch': 0.51} [2024-01-31 05:25:01,822] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7983, 'learning_rate': 1.0230520888062765e-05, 'epoch': 0.51} {'loss': 0.751, 'learning_rate': 1.0224291653270739e-05, 'epoch': 0.51} {'loss': 0.8643, 'learning_rate': 1.0218062331400969e-05, 'epoch': 0.51} {'loss': 0.7983, 'learning_rate': 1.0211832924871889e-05, 'epoch': 0.51} {'loss': 0.8472, 'learning_rate': 1.0205603436101978e-05, 'epoch': 0.51} {'loss': 0.7773, 'learning_rate': 1.0199373867509734e-05, 'epoch': 0.51} {'loss': 0.7759, 'learning_rate': 1.019314422151369e-05, 'epoch': 0.51} {'loss': 0.7344, 'learning_rate': 1.0186914500532408e-05, 'epoch': 0.51} {'loss': 0.8105, 'learning_rate': 1.0180684706984483e-05, 'epoch': 0.51} {'loss': 0.7642, 'learning_rate': 1.0174454843288533e-05, 'epoch': 0.51} {'loss': 0.7832, 'learning_rate': 1.0168224911863205e-05, 'epoch': 0.51} {'loss': 0.7671, 'learning_rate': 1.0161994915127173e-05, 'epoch': 0.51} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/525941290.jpg' {'loss': 0.7725, 'learning_rate': 1.015576485549914e-05, 'epoch': 0.51} {'loss': 0.7817, 'learning_rate': 1.0149534735397823e-05, 'epoch': 0.51} {'loss': 0.8037, 'learning_rate': 1.0143304557241979e-05, 'epoch': 0.51} {'loss': 0.7832, 'learning_rate': 1.0137074323450372e-05, 'epoch': 0.51} {'loss': 0.2567, 'learning_rate': 1.0130844036441787e-05, 'epoch': 0.51} {'loss': 0.7495, 'learning_rate': 1.0124613698635043e-05, 'epoch': 0.51} {'loss': 0.811, 'learning_rate': 1.0118383312448973e-05, 'epoch': 0.51} {'loss': 0.7837, 'learning_rate': 1.0112152880302426e-05, 'epoch': 0.51} {'loss': 0.2573, 'learning_rate': 1.0105922404614265e-05, 'epoch': 0.51} {'loss': 0.8076, 'learning_rate': 1.0099691887803385e-05, 'epoch': 0.51} {'loss': 0.7627, 'learning_rate': 1.0093461332288678e-05, 'epoch': 0.51} {'loss': 0.2832, 'learning_rate': 1.0087230740489065e-05, 'epoch': 0.51} {'loss': 0.7788, 'learning_rate': 1.0081000114823473e-05, 'epoch': 0.51} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/051770353X.jpg' {'loss': 0.7402, 'learning_rate': 1.007476945771085e-05, 'epoch': 0.51} {'loss': 0.8213, 'learning_rate': 1.006853877157015e-05, 'epoch': 0.51} {'loss': 0.7798, 'learning_rate': 1.0062308058820337e-05, 'epoch': 0.51} {'loss': 0.8457, 'learning_rate': 1.0056077321880393e-05, 'epoch': 0.51} {'loss': 0.7578, 'learning_rate': 1.0049846563169297e-05, 'epoch': 0.51} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/679888268.jpg' {'loss': 0.8018, 'learning_rate': 1.0043615785106051e-05, 'epoch': 0.51} {'loss': 0.7812, 'learning_rate': 1.0037384990109658e-05, 'epoch': 0.51} {'loss': 0.8003, 'learning_rate': 1.0031154180599123e-05, 'epoch': 0.51} {'loss': 0.2716, 'learning_rate': 1.0024923358993458e-05, 'epoch': 0.51} [2024-01-31 05:35:30,243] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7832, 'learning_rate': 1.0018692527711695e-05, 'epoch': 0.51} {'loss': 0.8052, 'learning_rate': 1.0012461689172846e-05, 'epoch': 0.51} {'loss': 0.7988, 'learning_rate': 1.0006230845795937e-05, 'epoch': 0.51} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1564582914.jpg' {'loss': 0.835, 'learning_rate': 1e-05, 'epoch': 0.52} {'loss': 0.7812, 'learning_rate': 9.993769154204063e-06, 'epoch': 0.52} {'loss': 0.7744, 'learning_rate': 9.987538310827159e-06, 'epoch': 0.52} {'loss': 0.7349, 'learning_rate': 9.981307472288308e-06, 'epoch': 0.52} {'loss': 0.7822, 'learning_rate': 9.975076641006542e-06, 'epoch': 0.52} {'loss': 0.8604, 'learning_rate': 9.968845819400883e-06, 'epoch': 0.52} {'loss': 0.7358, 'learning_rate': 9.962615009890346e-06, 'epoch': 0.52} {'loss': 0.8823, 'learning_rate': 9.956384214893949e-06, 'epoch': 0.52} {'loss': 0.7925, 'learning_rate': 9.950153436830707e-06, 'epoch': 0.52} {'loss': 0.7769, 'learning_rate': 9.94392267811961e-06, 'epoch': 0.52} {'loss': 0.8359, 'learning_rate': 9.937691941179665e-06, 'epoch': 0.52} {'loss': 0.7593, 'learning_rate': 9.931461228429856e-06, 'epoch': 0.52} {'loss': 0.7954, 'learning_rate': 9.925230542289151e-06, 'epoch': 0.52} {'loss': 0.7715, 'learning_rate': 9.91899988517653e-06, 'epoch': 0.52} {'loss': 0.8281, 'learning_rate': 9.912769259510938e-06, 'epoch': 0.52} {'loss': 0.7852, 'learning_rate': 9.906538667711324e-06, 'epoch': 0.52} {'loss': 0.7935, 'learning_rate': 9.90030811219662e-06, 'epoch': 0.52} {'loss': 0.7998, 'learning_rate': 9.894077595385736e-06, 'epoch': 0.52} {'loss': 0.7549, 'learning_rate': 9.887847119697577e-06, 'epoch': 0.52} {'loss': 0.8066, 'learning_rate': 9.881616687551032e-06, 'epoch': 0.52} {'loss': 0.7222, 'learning_rate': 9.875386301364958e-06, 'epoch': 0.52} {'loss': 0.7139, 'learning_rate': 9.869155963558215e-06, 'epoch': 0.52} {'loss': 0.8179, 'learning_rate': 9.862925676549635e-06, 'epoch': 0.52} {'loss': 0.791, 'learning_rate': 9.856695442758023e-06, 'epoch': 0.52} {'loss': 0.8315, 'learning_rate': 9.850465264602175e-06, 'epoch': 0.52} {'loss': 0.8276, 'learning_rate': 9.844235144500865e-06, 'epoch': 0.52} {'loss': 0.7964, 'learning_rate': 9.83800508487283e-06, 'epoch': 0.52} {'loss': 0.7598, 'learning_rate': 9.831775088136797e-06, 'epoch': 0.52} {'loss': 0.7925, 'learning_rate': 9.82554515671147e-06, 'epoch': 0.52} {'loss': 0.7769, 'learning_rate': 9.819315293015519e-06, 'epoch': 0.52} {'loss': 0.8179, 'learning_rate': 9.813085499467594e-06, 'epoch': 0.52} {'loss': 0.8022, 'learning_rate': 9.806855778486314e-06, 'epoch': 0.52} {'loss': 0.7495, 'learning_rate': 9.800626132490268e-06, 'epoch': 0.52} {'loss': 0.7852, 'learning_rate': 9.794396563898022e-06, 'epoch': 0.52} {'loss': 0.7866, 'learning_rate': 9.788167075128113e-06, 'epoch': 0.52} {'loss': 0.8042, 'learning_rate': 9.781937668599035e-06, 'epoch': 0.52} {'loss': 0.7905, 'learning_rate': 9.775708346729263e-06, 'epoch': 0.52} {'loss': 0.7632, 'learning_rate': 9.769479111937238e-06, 'epoch': 0.52} {'loss': 0.7891, 'learning_rate': 9.763249966641352e-06, 'epoch': 0.52} {'loss': 0.7729, 'learning_rate': 9.757020913259986e-06, 'epoch': 0.52} {'loss': 0.7534, 'learning_rate': 9.750791954211464e-06, 'epoch': 0.52} {'loss': 0.8042, 'learning_rate': 9.744563091914085e-06, 'epoch': 0.52} {'loss': 0.7793, 'learning_rate': 9.738334328786114e-06, 'epoch': 0.52} {'loss': 0.7969, 'learning_rate': 9.732105667245759e-06, 'epoch': 0.52} {'loss': 0.7432, 'learning_rate': 9.725877109711212e-06, 'epoch': 0.52} {'loss': 0.8286, 'learning_rate': 9.719648658600611e-06, 'epoch': 0.52} {'loss': 0.772, 'learning_rate': 9.71342031633205e-06, 'epoch': 0.52} {'loss': 0.7686, 'learning_rate': 9.70719208532359e-06, 'epoch': 0.52} {'loss': 0.7441, 'learning_rate': 9.700963967993246e-06, 'epoch': 0.52} {'loss': 0.7549, 'learning_rate': 9.694735966758982e-06, 'epoch': 0.52} {'loss': 0.8096, 'learning_rate': 9.688508084038729e-06, 'epoch': 0.52} {'loss': 0.8027, 'learning_rate': 9.682280322250365e-06, 'epoch': 0.52} {'loss': 0.7651, 'learning_rate': 9.676052683811715e-06, 'epoch': 0.53} {'loss': 0.8496, 'learning_rate': 9.669825171140568e-06, 'epoch': 0.53} {'loss': 0.7715, 'learning_rate': 9.66359778665466e-06, 'epoch': 0.53} {'loss': 0.835, 'learning_rate': 9.657370532771672e-06, 'epoch': 0.53} {'loss': 0.7754, 'learning_rate': 9.651143411909241e-06, 'epoch': 0.53} {'loss': 0.8071, 'learning_rate': 9.64491642648495e-06, 'epoch': 0.53} {'loss': 0.7778, 'learning_rate': 9.638689578916326e-06, 'epoch': 0.53} {'loss': 0.7646, 'learning_rate': 9.632462871620847e-06, 'epoch': 0.53} {'loss': 0.7642, 'learning_rate': 9.62623630701594e-06, 'epoch': 0.53} {'loss': 0.7417, 'learning_rate': 9.620009887518963e-06, 'epoch': 0.53} {'loss': 0.7627, 'learning_rate': 9.613783615547233e-06, 'epoch': 0.53} {'loss': 0.7749, 'learning_rate': 9.607557493518006e-06, 'epoch': 0.53} {'loss': 0.79, 'learning_rate': 9.601331523848464e-06, 'epoch': 0.53} {'loss': 0.7612, 'learning_rate': 9.595105708955758e-06, 'epoch': 0.53} {'loss': 0.2969, 'learning_rate': 9.588880051256951e-06, 'epoch': 0.53} {'loss': 0.8022, 'learning_rate': 9.582654553169064e-06, 'epoch': 0.53} {'loss': 0.8311, 'learning_rate': 9.576429217109054e-06, 'epoch': 0.53} {'loss': 0.772, 'learning_rate': 9.5702040454938e-06, 'epoch': 0.53} {'loss': 0.7764, 'learning_rate': 9.563979040740138e-06, 'epoch': 0.53} {'loss': 0.8037, 'learning_rate': 9.557754205264826e-06, 'epoch': 0.53} {'loss': 0.8174, 'learning_rate': 9.551529541484554e-06, 'epoch': 0.53} {'loss': 0.8179, 'learning_rate': 9.545305051815957e-06, 'epoch': 0.53} {'loss': 0.7344, 'learning_rate': 9.539080738675597e-06, 'epoch': 0.53} {'loss': 0.8042, 'learning_rate': 9.53285660447996e-06, 'epoch': 0.53} {'loss': 0.7646, 'learning_rate': 9.526632651645476e-06, 'epoch': 0.53} {'loss': 0.7554, 'learning_rate': 9.520408882588497e-06, 'epoch': 0.53} {'loss': 0.832, 'learning_rate': 9.514185299725299e-06, 'epoch': 0.53} {'loss': 0.835, 'learning_rate': 9.507961905472093e-06, 'epoch': 0.53} {'loss': 0.8315, 'learning_rate': 9.501738702245023e-06, 'epoch': 0.53} {'loss': 0.7578, 'learning_rate': 9.495515692460138e-06, 'epoch': 0.53} {'loss': 0.7622, 'learning_rate': 9.489292878533436e-06, 'epoch': 0.53} {'loss': 0.7944, 'learning_rate': 9.483070262880823e-06, 'epoch': 0.53} {'loss': 0.7583, 'learning_rate': 9.476847847918126e-06, 'epoch': 0.53} {'loss': 0.2758, 'learning_rate': 9.47062563606111e-06, 'epoch': 0.53} {'loss': 0.811, 'learning_rate': 9.464403629725454e-06, 'epoch': 0.53} {'loss': 0.835, 'learning_rate': 9.458181831326744e-06, 'epoch': 0.53} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/938076140.jpg' {'loss': 0.7124, 'learning_rate': 9.451960243280506e-06, 'epoch': 0.53} {'loss': 0.7969, 'learning_rate': 9.44573886800217e-06, 'epoch': 0.53} {'loss': 0.7749, 'learning_rate': 9.43951770790709e-06, 'epoch': 0.53} {'loss': 0.8394, 'learning_rate': 9.433296765410534e-06, 'epoch': 0.53} {'loss': 0.7783, 'learning_rate': 9.427076042927683e-06, 'epoch': 0.53} {'loss': 0.8057, 'learning_rate': 9.420855542873638e-06, 'epoch': 0.53} {'loss': 0.7822, 'learning_rate': 9.414635267663416e-06, 'epoch': 0.53} {'loss': 0.7935, 'learning_rate': 9.408415219711934e-06, 'epoch': 0.53} {'loss': 0.769, 'learning_rate': 9.402195401434036e-06, 'epoch': 0.53} {'loss': 0.8286, 'learning_rate': 9.395975815244468e-06, 'epoch': 0.53} {'loss': 0.8403, 'learning_rate': 9.389756463557878e-06, 'epoch': 0.53} {'loss': 0.267, 'learning_rate': 9.383537348788844e-06, 'epoch': 0.53} {'loss': 0.752, 'learning_rate': 9.377318473351838e-06, 'epoch': 0.53} {'loss': 0.7397, 'learning_rate': 9.371099839661238e-06, 'epoch': 0.53} {'loss': 0.8052, 'learning_rate': 9.364881450131335e-06, 'epoch': 0.53} {'loss': 0.7998, 'learning_rate': 9.358663307176323e-06, 'epoch': 0.53} {'loss': 0.8223, 'learning_rate': 9.352445413210294e-06, 'epoch': 0.54} {'loss': 0.7939, 'learning_rate': 9.346227770647251e-06, 'epoch': 0.54} {'loss': 0.7993, 'learning_rate': 9.3400103819011e-06, 'epoch': 0.54} {'loss': 0.7695, 'learning_rate': 9.33379324938564e-06, 'epoch': 0.54} {'loss': 0.7935, 'learning_rate': 9.327576375514582e-06, 'epoch': 0.54} {'loss': 0.7861, 'learning_rate': 9.321359762701527e-06, 'epoch': 0.54} {'loss': 0.7505, 'learning_rate': 9.315143413359975e-06, 'epoch': 0.54} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/810955563.jpg' {'loss': 0.7764, 'learning_rate': 9.308927329903333e-06, 'epoch': 0.54} {'loss': 0.8145, 'learning_rate': 9.302711514744897e-06, 'epoch': 0.54} {'loss': 0.8145, 'learning_rate': 9.296495970297855e-06, 'epoch': 0.54} {'loss': 0.7754, 'learning_rate': 9.290280698975307e-06, 'epoch': 0.54} {'loss': 0.8286, 'learning_rate': 9.284065703190221e-06, 'epoch': 0.54} {'loss': 0.7729, 'learning_rate': 9.27785098535548e-06, 'epoch': 0.54} {'loss': 0.791, 'learning_rate': 9.271636547883856e-06, 'epoch': 0.54} {'loss': 0.8066, 'learning_rate': 9.265422393187998e-06, 'epoch': 0.54} {'loss': 0.7588, 'learning_rate': 9.259208523680457e-06, 'epoch': 0.54} {'loss': 0.7891, 'learning_rate': 9.252994941773679e-06, 'epoch': 0.54} {'loss': 0.813, 'learning_rate': 9.24678164987998e-06, 'epoch': 0.54} {'loss': 0.7949, 'learning_rate': 9.24056865041158e-06, 'epoch': 0.54} {'loss': 0.7891, 'learning_rate': 9.234355945780581e-06, 'epoch': 0.54} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/673384772.jpg' {'loss': 0.7329, 'learning_rate': 9.228143538398963e-06, 'epoch': 0.54} {'loss': 0.8081, 'learning_rate': 9.221931430678598e-06, 'epoch': 0.54} {'loss': 0.7886, 'learning_rate': 9.215719625031245e-06, 'epoch': 0.54} {'loss': 0.7944, 'learning_rate': 9.209508123868534e-06, 'epoch': 0.54} {'loss': 0.812, 'learning_rate': 9.203296929601986e-06, 'epoch': 0.54} {'loss': 0.7607, 'learning_rate': 9.197086044643004e-06, 'epoch': 0.54} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/870331612.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B011M9LHUO.jpg' {'loss': 0.2668, 'learning_rate': 9.190875471402865e-06, 'epoch': 0.54} {'loss': 0.7437, 'learning_rate': 9.184665212292723e-06, 'epoch': 0.54} {'loss': 0.7461, 'learning_rate': 9.178455269723623e-06, 'epoch': 0.54} {'loss': 0.7876, 'learning_rate': 9.172245646106471e-06, 'epoch': 0.54} {'loss': 0.8101, 'learning_rate': 9.166036343852061e-06, 'epoch': 0.54} {'loss': 0.7759, 'learning_rate': 9.159827365371055e-06, 'epoch': 0.54} {'loss': 0.7637, 'learning_rate': 9.153618713073995e-06, 'epoch': 0.54} {'loss': 0.7148, 'learning_rate': 9.14741038937129e-06, 'epoch': 0.54} {'loss': 0.7666, 'learning_rate': 9.141202396673232e-06, 'epoch': 0.54} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/133099156.jpg' {'loss': 0.8359, 'learning_rate': 9.13499473738997e-06, 'epoch': 0.54} {'loss': 0.7822, 'learning_rate': 9.128787413931536e-06, 'epoch': 0.54} {'loss': 0.8105, 'learning_rate': 9.122580428707822e-06, 'epoch': 0.54} {'loss': 0.7524, 'learning_rate': 9.116373784128597e-06, 'epoch': 0.54} {'loss': 0.772, 'learning_rate': 9.110167482603494e-06, 'epoch': 0.54} {'loss': 0.8105, 'learning_rate': 9.10396152654201e-06, 'epoch': 0.54} {'loss': 0.7939, 'learning_rate': 9.097755918353513e-06, 'epoch': 0.54} {'loss': 0.8608, 'learning_rate': 9.091550660447236e-06, 'epoch': 0.54} {'loss': 0.8418, 'learning_rate': 9.08534575523227e-06, 'epoch': 0.54} {'loss': 0.8237, 'learning_rate': 9.079141205117573e-06, 'epoch': 0.54} {'loss': 0.2661, 'learning_rate': 9.072937012511968e-06, 'epoch': 0.54} {'loss': 0.749, 'learning_rate': 9.066733179824134e-06, 'epoch': 0.54} {'loss': 0.7217, 'learning_rate': 9.060529709462613e-06, 'epoch': 0.54} {'loss': 0.7998, 'learning_rate': 9.054326603835807e-06, 'epoch': 0.54} {'loss': 0.7134, 'learning_rate': 9.048123865351971e-06, 'epoch': 0.54} {'loss': 0.7681, 'learning_rate': 9.041921496419225e-06, 'epoch': 0.54} {'loss': 0.7563, 'learning_rate': 9.035719499445545e-06, 'epoch': 0.54} {'loss': 0.772, 'learning_rate': 9.029517876838755e-06, 'epoch': 0.55} {'loss': 0.8301, 'learning_rate': 9.023316631006536e-06, 'epoch': 0.55} {'loss': 0.7764, 'learning_rate': 9.017115764356436e-06, 'epoch': 0.55} {'loss': 0.7759, 'learning_rate': 9.010915279295833e-06, 'epoch': 0.55} {'loss': 0.7876, 'learning_rate': 9.004715178231975e-06, 'epoch': 0.55} {'loss': 0.7871, 'learning_rate': 8.998515463571953e-06, 'epoch': 0.55} {'loss': 0.793, 'learning_rate': 8.992316137722711e-06, 'epoch': 0.55} {'loss': 0.7559, 'learning_rate': 8.986117203091042e-06, 'epoch': 0.55} {'loss': 0.8101, 'learning_rate': 8.97991866208358e-06, 'epoch': 0.55} {'loss': 0.8447, 'learning_rate': 8.973720517106814e-06, 'epoch': 0.55} {'loss': 0.8042, 'learning_rate': 8.967522770567086e-06, 'epoch': 0.55} {'loss': 0.7925, 'learning_rate': 8.961325424870561e-06, 'epoch': 0.55} [2024-01-31 06:28:12,321] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7842, 'learning_rate': 8.955128482423271e-06, 'epoch': 0.55} [2024-01-31 06:28:31,312] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7739, 'learning_rate': 8.948931945631082e-06, 'epoch': 0.55} {'loss': 0.7788, 'learning_rate': 8.9427358168997e-06, 'epoch': 0.55} [2024-01-31 06:29:13,549] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7769, 'learning_rate': 8.936540098634675e-06, 'epoch': 0.55} [2024-01-31 06:29:31,929] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7754, 'learning_rate': 8.930344793241404e-06, 'epoch': 0.55} {'loss': 0.7729, 'learning_rate': 8.924149903125108e-06, 'epoch': 0.55} {'loss': 0.791, 'learning_rate': 8.917955430690865e-06, 'epoch': 0.55} [2024-01-31 06:30:25,661] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7725, 'learning_rate': 8.91176137834358e-06, 'epoch': 0.55} [2024-01-31 06:30:42,036] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8042, 'learning_rate': 8.905567748487997e-06, 'epoch': 0.55} {'loss': 0.7915, 'learning_rate': 8.899374543528695e-06, 'epoch': 0.55} {'loss': 0.7432, 'learning_rate': 8.893181765870094e-06, 'epoch': 0.55} {'loss': 0.7827, 'learning_rate': 8.886989417916435e-06, 'epoch': 0.55} {'loss': 0.772, 'learning_rate': 8.88079750207181e-06, 'epoch': 0.55} {'loss': 0.7842, 'learning_rate': 8.87460602074013e-06, 'epoch': 0.55} {'loss': 0.752, 'learning_rate': 8.86841497632514e-06, 'epoch': 0.55} {'loss': 0.2802, 'learning_rate': 8.862224371230418e-06, 'epoch': 0.55} {'loss': 0.8501, 'learning_rate': 8.85603420785937e-06, 'epoch': 0.55} {'loss': 0.7476, 'learning_rate': 8.84984448861523e-06, 'epoch': 0.55} {'loss': 0.8052, 'learning_rate': 8.84365521590106e-06, 'epoch': 0.55} {'loss': 0.7637, 'learning_rate': 8.837466392119752e-06, 'epoch': 0.55} {'loss': 0.7944, 'learning_rate': 8.831278019674017e-06, 'epoch': 0.55} {'loss': 0.7388, 'learning_rate': 8.825090100966396e-06, 'epoch': 0.55} {'loss': 0.8228, 'learning_rate': 8.818902638399247e-06, 'epoch': 0.55} {'loss': 0.8164, 'learning_rate': 8.81271563437476e-06, 'epoch': 0.55} {'loss': 0.7988, 'learning_rate': 8.806529091294948e-06, 'epoch': 0.55} {'loss': 0.7769, 'learning_rate': 8.800343011561633e-06, 'epoch': 0.55} {'loss': 0.7734, 'learning_rate': 8.794157397576464e-06, 'epoch': 0.55} {'loss': 0.7686, 'learning_rate': 8.787972251740916e-06, 'epoch': 0.55} {'loss': 0.8301, 'learning_rate': 8.781787576456269e-06, 'epoch': 0.55} {'loss': 0.7495, 'learning_rate': 8.775603374123627e-06, 'epoch': 0.55} {'loss': 0.7939, 'learning_rate': 8.769419647143917e-06, 'epoch': 0.55} {'loss': 0.814, 'learning_rate': 8.763236397917865e-06, 'epoch': 0.55} {'loss': 0.7754, 'learning_rate': 8.757053628846028e-06, 'epoch': 0.55} {'loss': 0.7666, 'learning_rate': 8.75087134232877e-06, 'epoch': 0.55} {'loss': 0.7856, 'learning_rate': 8.744689540766265e-06, 'epoch': 0.55} {'loss': 0.7661, 'learning_rate': 8.738508226558499e-06, 'epoch': 0.55} {'loss': 0.8101, 'learning_rate': 8.73232740210528e-06, 'epoch': 0.55} {'loss': 0.7788, 'learning_rate': 8.726147069806206e-06, 'epoch': 0.55} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739714023.jpg' {'loss': 0.7632, 'learning_rate': 8.719967232060698e-06, 'epoch': 0.55} {'loss': 0.812, 'learning_rate': 8.713787891267988e-06, 'epoch': 0.55} {'loss': 0.8354, 'learning_rate': 8.707609049827102e-06, 'epoch': 0.56} {'loss': 0.7124, 'learning_rate': 8.70143071013688e-06, 'epoch': 0.56} {'loss': 0.7822, 'learning_rate': 8.695252874595972e-06, 'epoch': 0.56} {'loss': 0.8169, 'learning_rate': 8.689075545602816e-06, 'epoch': 0.56} {'loss': 0.8115, 'learning_rate': 8.68289872555567e-06, 'epoch': 0.56} {'loss': 0.7886, 'learning_rate': 8.676722416852594e-06, 'epoch': 0.56} {'loss': 0.7886, 'learning_rate': 8.670546621891434e-06, 'epoch': 0.56} {'loss': 0.7861, 'learning_rate': 8.66437134306985e-06, 'epoch': 0.56} {'loss': 0.7969, 'learning_rate': 8.658196582785297e-06, 'epoch': 0.56} {'loss': 0.8438, 'learning_rate': 8.652022343435027e-06, 'epoch': 0.56} {'loss': 0.7207, 'learning_rate': 8.645848627416102e-06, 'epoch': 0.56} {'loss': 0.8262, 'learning_rate': 8.63967543712536e-06, 'epoch': 0.56} {'loss': 0.7778, 'learning_rate': 8.633502774959453e-06, 'epoch': 0.56} {'loss': 0.7773, 'learning_rate': 8.627330643314818e-06, 'epoch': 0.56} [2024-01-31 06:44:45,575] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7305, 'learning_rate': 8.62115904458769e-06, 'epoch': 0.56} {'loss': 0.7964, 'learning_rate': 8.614987981174093e-06, 'epoch': 0.56} {'loss': 0.791, 'learning_rate': 8.608817455469854e-06, 'epoch': 0.56} {'loss': 0.7856, 'learning_rate': 8.602647469870573e-06, 'epoch': 0.56} {'loss': 0.7407, 'learning_rate': 8.596478026771658e-06, 'epoch': 0.56} {'loss': 0.813, 'learning_rate': 8.590309128568303e-06, 'epoch': 0.56} {'loss': 0.2517, 'learning_rate': 8.584140777655476e-06, 'epoch': 0.56} {'loss': 0.7764, 'learning_rate': 8.57797297642795e-06, 'epoch': 0.56} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/415913756.jpg' {'loss': 0.7812, 'learning_rate': 8.571805727280278e-06, 'epoch': 0.56} {'loss': 0.832, 'learning_rate': 8.565639032606794e-06, 'epoch': 0.56} {'loss': 0.8193, 'learning_rate': 8.559472894801623e-06, 'epoch': 0.56} {'loss': 0.8628, 'learning_rate': 8.553307316258678e-06, 'epoch': 0.56} {'loss': 0.7744, 'learning_rate': 8.547142299371642e-06, 'epoch': 0.56} {'loss': 0.8403, 'learning_rate': 8.540977846533986e-06, 'epoch': 0.56} {'loss': 0.8237, 'learning_rate': 8.534813960138968e-06, 'epoch': 0.56} {'loss': 0.7715, 'learning_rate': 8.528650642579618e-06, 'epoch': 0.56} {'loss': 0.7422, 'learning_rate': 8.52248789624875e-06, 'epoch': 0.56} {'loss': 0.7974, 'learning_rate': 8.516325723538949e-06, 'epoch': 0.56} {'loss': 0.771, 'learning_rate': 8.510164126842591e-06, 'epoch': 0.56} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/882894293.jpg' {'loss': 0.7905, 'learning_rate': 8.504003108551814e-06, 'epoch': 0.56} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/749521643.jpg' {'loss': 0.7666, 'learning_rate': 8.497842671058539e-06, 'epoch': 0.56} {'loss': 0.7988, 'learning_rate': 8.491682816754456e-06, 'epoch': 0.56} {'loss': 0.8042, 'learning_rate': 8.485523548031044e-06, 'epoch': 0.56} {'loss': 0.7402, 'learning_rate': 8.479364867279529e-06, 'epoch': 0.56} {'loss': 0.7817, 'learning_rate': 8.47320677689093e-06, 'epoch': 0.56} {'loss': 0.7915, 'learning_rate': 8.467049279256034e-06, 'epoch': 0.56} {'loss': 0.8301, 'learning_rate': 8.460892376765387e-06, 'epoch': 0.56} {'loss': 0.8203, 'learning_rate': 8.45473607180931e-06, 'epoch': 0.56} {'loss': 0.792, 'learning_rate': 8.448580366777898e-06, 'epoch': 0.56} {'loss': 0.7905, 'learning_rate': 8.442425264061e-06, 'epoch': 0.56} {'loss': 0.7485, 'learning_rate': 8.436270766048245e-06, 'epoch': 0.56} {'loss': 0.7944, 'learning_rate': 8.430116875129023e-06, 'epoch': 0.56} {'loss': 0.7861, 'learning_rate': 8.42396359369248e-06, 'epoch': 0.56} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/966586611.jpg' {'loss': 0.7871, 'learning_rate': 8.417810924127533e-06, 'epoch': 0.56} {'loss': 0.8291, 'learning_rate': 8.411658868822866e-06, 'epoch': 0.56} {'loss': 0.7554, 'learning_rate': 8.40550743016691e-06, 'epoch': 0.56} {'loss': 0.7773, 'learning_rate': 8.39935661054787e-06, 'epoch': 0.56} {'loss': 0.8096, 'learning_rate': 8.393206412353709e-06, 'epoch': 0.56} {'loss': 0.8398, 'learning_rate': 8.38705683797214e-06, 'epoch': 0.57} {'loss': 0.7886, 'learning_rate': 8.38090788979064e-06, 'epoch': 0.57} {'loss': 0.8232, 'learning_rate': 8.374759570196448e-06, 'epoch': 0.57} {'loss': 0.8311, 'learning_rate': 8.368611881576547e-06, 'epoch': 0.57} {'loss': 0.7783, 'learning_rate': 8.362464826317687e-06, 'epoch': 0.57} {'loss': 0.7886, 'learning_rate': 8.35631840680636e-06, 'epoch': 0.57} {'loss': 0.8804, 'learning_rate': 8.35017262542882e-06, 'epoch': 0.57} {'loss': 0.7417, 'learning_rate': 8.344027484571075e-06, 'epoch': 0.57} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B013RVJ7KW.jpg' {'loss': 0.8276, 'learning_rate': 8.337882986618877e-06, 'epoch': 0.57} {'loss': 0.255, 'learning_rate': 8.331739133957729e-06, 'epoch': 0.57} {'loss': 0.8213, 'learning_rate': 8.325595928972894e-06, 'epoch': 0.57} {'loss': 0.7979, 'learning_rate': 8.319453374049367e-06, 'epoch': 0.57} {'loss': 0.7993, 'learning_rate': 8.313311471571903e-06, 'epoch': 0.57} {'loss': 0.7847, 'learning_rate': 8.307170223925003e-06, 'epoch': 0.57} {'loss': 0.8203, 'learning_rate': 8.301029633492907e-06, 'epoch': 0.57} {'loss': 0.2557, 'learning_rate': 8.294889702659602e-06, 'epoch': 0.57} {'loss': 0.7495, 'learning_rate': 8.288750433808828e-06, 'epoch': 0.57} {'loss': 0.7637, 'learning_rate': 8.282611829324049e-06, 'epoch': 0.57} {'loss': 0.8071, 'learning_rate': 8.276473891588492e-06, 'epoch': 0.57} {'loss': 0.7705, 'learning_rate': 8.270336622985116e-06, 'epoch': 0.57} {'loss': 0.8257, 'learning_rate': 8.264200025896616e-06, 'epoch': 0.57} {'loss': 0.8208, 'learning_rate': 8.258064102705428e-06, 'epoch': 0.57} {'loss': 0.8184, 'learning_rate': 8.251928855793736e-06, 'epoch': 0.57} {'loss': 0.7847, 'learning_rate': 8.245794287543447e-06, 'epoch': 0.57} {'loss': 0.7827, 'learning_rate': 8.239660400336213e-06, 'epoch': 0.57} {'loss': 0.7607, 'learning_rate': 8.233527196553428e-06, 'epoch': 0.57} {'loss': 0.8374, 'learning_rate': 8.227394678576204e-06, 'epoch': 0.57} {'loss': 0.7808, 'learning_rate': 8.221262848785395e-06, 'epoch': 0.57} {'loss': 0.7715, 'learning_rate': 8.215131709561597e-06, 'epoch': 0.57} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/375400664.jpg' {'loss': 0.8022, 'learning_rate': 8.20900126328512e-06, 'epoch': 0.57} {'loss': 0.6821, 'learning_rate': 8.202871512336023e-06, 'epoch': 0.57} {'loss': 0.731, 'learning_rate': 8.196742459094079e-06, 'epoch': 0.57} {'loss': 0.7539, 'learning_rate': 8.190614105938796e-06, 'epoch': 0.57} {'loss': 0.7749, 'learning_rate': 8.184486455249424e-06, 'epoch': 0.57} {'loss': 0.7881, 'learning_rate': 8.178359509404916e-06, 'epoch': 0.57} {'loss': 0.7822, 'learning_rate': 8.172233270783966e-06, 'epoch': 0.57} {'loss': 0.7598, 'learning_rate': 8.166107741764997e-06, 'epoch': 0.57} {'loss': 0.7588, 'learning_rate': 8.15998292472614e-06, 'epoch': 0.57} {'loss': 0.7471, 'learning_rate': 8.153858822045267e-06, 'epoch': 0.57} {'loss': 0.7725, 'learning_rate': 8.147735436099967e-06, 'epoch': 0.57} {'loss': 0.8281, 'learning_rate': 8.141612769267543e-06, 'epoch': 0.57} {'loss': 0.7681, 'learning_rate': 8.135490823925027e-06, 'epoch': 0.57} {'loss': 0.7568, 'learning_rate': 8.129369602449176e-06, 'epoch': 0.57} {'loss': 0.7622, 'learning_rate': 8.123249107216446e-06, 'epoch': 0.57} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/471542989.jpg' {'loss': 0.7515, 'learning_rate': 8.117129340603032e-06, 'epoch': 0.57} {'loss': 0.7627, 'learning_rate': 8.111010304984841e-06, 'epoch': 0.57} {'loss': 0.7656, 'learning_rate': 8.104892002737488e-06, 'epoch': 0.57} {'loss': 0.7485, 'learning_rate': 8.098774436236308e-06, 'epoch': 0.57} {'loss': 0.7456, 'learning_rate': 8.092657607856356e-06, 'epoch': 0.57} {'loss': 0.7349, 'learning_rate': 8.086541519972388e-06, 'epoch': 0.57} {'loss': 0.7979, 'learning_rate': 8.080426174958886e-06, 'epoch': 0.57} {'loss': 0.7456, 'learning_rate': 8.074311575190039e-06, 'epoch': 0.57} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/915801841.jpg' {'loss': 0.2748, 'learning_rate': 8.068197723039738e-06, 'epoch': 0.58} {'loss': 0.769, 'learning_rate': 8.062084620881598e-06, 'epoch': 0.58} {'loss': 0.7715, 'learning_rate': 8.055972271088933e-06, 'epoch': 0.58} {'loss': 0.7661, 'learning_rate': 8.049860676034762e-06, 'epoch': 0.58} {'loss': 0.7598, 'learning_rate': 8.043749838091828e-06, 'epoch': 0.58} {'loss': 0.8223, 'learning_rate': 8.037639759632558e-06, 'epoch': 0.58} {'loss': 0.7446, 'learning_rate': 8.031530443029099e-06, 'epoch': 0.58} {'loss': 0.7388, 'learning_rate': 8.025421890653303e-06, 'epoch': 0.58} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/805034676.jpg' {'loss': 0.8203, 'learning_rate': 8.019314104876712e-06, 'epoch': 0.58} {'loss': 0.7217, 'learning_rate': 8.013207088070582e-06, 'epoch': 0.58} {'loss': 0.811, 'learning_rate': 8.007100842605872e-06, 'epoch': 0.58} {'loss': 0.8076, 'learning_rate': 8.000995370853227e-06, 'epoch': 0.58} [2024-01-31 07:18:37,508] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8091, 'learning_rate': 7.994890675183008e-06, 'epoch': 0.58} {'loss': 0.7891, 'learning_rate': 7.98878675796527e-06, 'epoch': 0.58} {'loss': 0.7749, 'learning_rate': 7.98268362156976e-06, 'epoch': 0.58} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/471243787.jpg' {'loss': 0.7832, 'learning_rate': 7.976581268365924e-06, 'epoch': 0.58} {'loss': 0.751, 'learning_rate': 7.97047970072291e-06, 'epoch': 0.58} {'loss': 0.8115, 'learning_rate': 7.964378921009552e-06, 'epoch': 0.58} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/289800900.jpg' {'loss': 0.7754, 'learning_rate': 7.958278931594385e-06, 'epoch': 0.58} {'loss': 0.2709, 'learning_rate': 7.952179734845642e-06, 'epoch': 0.58} {'loss': 0.7251, 'learning_rate': 7.946081333131227e-06, 'epoch': 0.58} {'loss': 0.8101, 'learning_rate': 7.93998372881876e-06, 'epoch': 0.58} {'loss': 0.7915, 'learning_rate': 7.93388692427554e-06, 'epoch': 0.58} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1580170536.jpg' [2024-01-31 07:22:03,210] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7051, 'learning_rate': 7.92779092186855e-06, 'epoch': 0.58} {'loss': 0.7246, 'learning_rate': 7.921695723964473e-06, 'epoch': 0.58} [2024-01-31 07:22:40,317] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8345, 'learning_rate': 7.915601332929678e-06, 'epoch': 0.58} {'loss': 0.7188, 'learning_rate': 7.90950775113021e-06, 'epoch': 0.58} [2024-01-31 07:23:20,483] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.79, 'learning_rate': 7.903414980931813e-06, 'epoch': 0.58} {'loss': 0.7407, 'learning_rate': 7.897323024699907e-06, 'epoch': 0.58} {'loss': 0.7603, 'learning_rate': 7.8912318847996e-06, 'epoch': 0.58} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/014005667X.jpg' [2024-01-31 07:24:14,370] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7798, 'learning_rate': 7.885141563595685e-06, 'epoch': 0.58} {'loss': 0.7632, 'learning_rate': 7.879052063452626e-06, 'epoch': 0.58} {'loss': 0.7573, 'learning_rate': 7.872963386734584e-06, 'epoch': 0.58} {'loss': 0.7935, 'learning_rate': 7.866875535805394e-06, 'epoch': 0.58} [2024-01-31 07:25:25,042] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7783, 'learning_rate': 7.860788513028566e-06, 'epoch': 0.58} {'loss': 0.7998, 'learning_rate': 7.85470232076729e-06, 'epoch': 0.58} {'loss': 0.8008, 'learning_rate': 7.848616961384442e-06, 'epoch': 0.58} [2024-01-31 07:26:16,446] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8286, 'learning_rate': 7.842532437242559e-06, 'epoch': 0.58} {'loss': 0.7637, 'learning_rate': 7.83644875070387e-06, 'epoch': 0.58} [2024-01-31 07:26:51,215] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7598, 'learning_rate': 7.83036590413027e-06, 'epoch': 0.58} [2024-01-31 07:27:09,868] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8047, 'learning_rate': 7.824283899883327e-06, 'epoch': 0.58} [2024-01-31 07:27:30,691] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7461, 'learning_rate': 7.818202740324287e-06, 'epoch': 0.58} {'loss': 0.7852, 'learning_rate': 7.812122427814068e-06, 'epoch': 0.58} {'loss': 0.7822, 'learning_rate': 7.806042964713248e-06, 'epoch': 0.58} {'loss': 0.7588, 'learning_rate': 7.79996435338209e-06, 'epoch': 0.58} {'loss': 0.7539, 'learning_rate': 7.793886596180521e-06, 'epoch': 0.58} {'loss': 0.7671, 'learning_rate': 7.787809695468134e-06, 'epoch': 0.58} {'loss': 0.2697, 'learning_rate': 7.78173365360419e-06, 'epoch': 0.58} {'loss': 0.813, 'learning_rate': 7.775658472947623e-06, 'epoch': 0.58} {'loss': 0.792, 'learning_rate': 7.769584155857019e-06, 'epoch': 0.58} {'loss': 0.8008, 'learning_rate': 7.763510704690645e-06, 'epoch': 0.58} {'loss': 0.7705, 'learning_rate': 7.757438121806414e-06, 'epoch': 0.58} {'loss': 0.7798, 'learning_rate': 7.75136640956192e-06, 'epoch': 0.59} {'loss': 0.751, 'learning_rate': 7.745295570314412e-06, 'epoch': 0.59} {'loss': 0.8052, 'learning_rate': 7.739225606420793e-06, 'epoch': 0.59} {'loss': 0.7822, 'learning_rate': 7.733156520237633e-06, 'epoch': 0.59} {'loss': 0.8193, 'learning_rate': 7.727088314121165e-06, 'epoch': 0.59} {'loss': 0.7573, 'learning_rate': 7.721020990427268e-06, 'epoch': 0.59} {'loss': 0.7891, 'learning_rate': 7.714954551511489e-06, 'epoch': 0.59} {'loss': 0.7725, 'learning_rate': 7.708888999729036e-06, 'epoch': 0.59} {'loss': 0.7695, 'learning_rate': 7.702824337434756e-06, 'epoch': 0.59} {'loss': 0.2722, 'learning_rate': 7.69676056698316e-06, 'epoch': 0.59} {'loss': 0.791, 'learning_rate': 7.690697690728417e-06, 'epoch': 0.59} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/816512019.jpg' {'loss': 0.7715, 'learning_rate': 7.68463571102434e-06, 'epoch': 0.59} {'loss': 0.7451, 'learning_rate': 7.678574630224399e-06, 'epoch': 0.59} {'loss': 0.7563, 'learning_rate': 7.672514450681721e-06, 'epoch': 0.59} {'loss': 0.7822, 'learning_rate': 7.666455174749066e-06, 'epoch': 0.59} {'loss': 0.7632, 'learning_rate': 7.66039680477886e-06, 'epoch': 0.59} {'loss': 0.7646, 'learning_rate': 7.654339343123173e-06, 'epoch': 0.59} {'loss': 0.7559, 'learning_rate': 7.648282792133711e-06, 'epoch': 0.59} {'loss': 0.7407, 'learning_rate': 7.642227154161841e-06, 'epoch': 0.59} {'loss': 0.8086, 'learning_rate': 7.636172431558575e-06, 'epoch': 0.59} {'loss': 0.8477, 'learning_rate': 7.630118626674557e-06, 'epoch': 0.59} {'loss': 0.2758, 'learning_rate': 7.6240657418600846e-06, 'epoch': 0.59} {'loss': 0.2693, 'learning_rate': 7.618013779465101e-06, 'epoch': 0.59} {'loss': 0.7759, 'learning_rate': 7.611962741839178e-06, 'epoch': 0.59} {'loss': 0.8081, 'learning_rate': 7.6059126313315466e-06, 'epoch': 0.59} {'loss': 0.7881, 'learning_rate': 7.599863450291056e-06, 'epoch': 0.59} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1564588963.jpg' {'loss': 0.8032, 'learning_rate': 7.593815201066215e-06, 'epoch': 0.59} [2024-01-31 07:39:17,780] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7886, 'learning_rate': 7.587767886005164e-06, 'epoch': 0.59} [2024-01-31 07:39:34,783] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8062, 'learning_rate': 7.581721507455672e-06, 'epoch': 0.59} {'loss': 0.8008, 'learning_rate': 7.575676067765154e-06, 'epoch': 0.59} {'loss': 0.79, 'learning_rate': 7.569631569280662e-06, 'epoch': 0.59} {'loss': 0.7822, 'learning_rate': 7.563588014348871e-06, 'epoch': 0.59} {'loss': 0.8047, 'learning_rate': 7.5575454053161e-06, 'epoch': 0.59} {'loss': 0.8052, 'learning_rate': 7.551503744528304e-06, 'epoch': 0.59} {'loss': 0.7319, 'learning_rate': 7.545463034331054e-06, 'epoch': 0.59} {'loss': 0.7974, 'learning_rate': 7.539423277069568e-06, 'epoch': 0.59} {'loss': 0.2616, 'learning_rate': 7.53338447508869e-06, 'epoch': 0.59} {'loss': 0.7588, 'learning_rate': 7.52734663073288e-06, 'epoch': 0.59} {'loss': 0.748, 'learning_rate': 7.521309746346246e-06, 'epoch': 0.59} {'loss': 0.8257, 'learning_rate': 7.515273824272516e-06, 'epoch': 0.59} {'loss': 0.8057, 'learning_rate': 7.509238866855033e-06, 'epoch': 0.59} {'loss': 0.2621, 'learning_rate': 7.503204876436785e-06, 'epoch': 0.59} {'loss': 0.7876, 'learning_rate': 7.497171855360372e-06, 'epoch': 0.59} {'loss': 0.7866, 'learning_rate': 7.491139805968018e-06, 'epoch': 0.59} {'loss': 0.7568, 'learning_rate': 7.485108730601571e-06, 'epoch': 0.59} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/316051772.jpg' {'loss': 0.7583, 'learning_rate': 7.4790786316025125e-06, 'epoch': 0.59} {'loss': 0.792, 'learning_rate': 7.473049511311921e-06, 'epoch': 0.59} {'loss': 0.8008, 'learning_rate': 7.467021372070515e-06, 'epoch': 0.59} {'loss': 0.2832, 'learning_rate': 7.46099421621863e-06, 'epoch': 0.59} {'loss': 0.248, 'learning_rate': 7.4549680460962044e-06, 'epoch': 0.59} {'loss': 0.8311, 'learning_rate': 7.448942864042819e-06, 'epoch': 0.59} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/055305340X.jpg' {'loss': 0.2729, 'learning_rate': 7.4429186723976425e-06, 'epoch': 0.59} {'loss': 0.7964, 'learning_rate': 7.43689547349948e-06, 'epoch': 0.6} {'loss': 0.7939, 'learning_rate': 7.43087326968675e-06, 'epoch': 0.6} {'loss': 0.7617, 'learning_rate': 7.42485206329747e-06, 'epoch': 0.6} {'loss': 0.7837, 'learning_rate': 7.418831856669286e-06, 'epoch': 0.6} {'loss': 0.7793, 'learning_rate': 7.41281265213945e-06, 'epoch': 0.6} {'loss': 0.7983, 'learning_rate': 7.406794452044816e-06, 'epoch': 0.6} {'loss': 0.8066, 'learning_rate': 7.400777258721865e-06, 'epoch': 0.6} {'loss': 0.7988, 'learning_rate': 7.394761074506679e-06, 'epoch': 0.6} {'loss': 0.8442, 'learning_rate': 7.3887459017349405e-06, 'epoch': 0.6} {'loss': 0.7524, 'learning_rate': 7.382731742741953e-06, 'epoch': 0.6} {'loss': 0.7705, 'learning_rate': 7.376718599862621e-06, 'epoch': 0.6} {'loss': 0.7783, 'learning_rate': 7.370706475431446e-06, 'epoch': 0.6} {'loss': 0.8315, 'learning_rate': 7.364695371782547e-06, 'epoch': 0.6} {'loss': 0.7832, 'learning_rate': 7.358685291249644e-06, 'epoch': 0.6} {'loss': 0.8369, 'learning_rate': 7.352676236166051e-06, 'epoch': 0.6} {'loss': 0.7578, 'learning_rate': 7.346668208864695e-06, 'epoch': 0.6} {'loss': 0.854, 'learning_rate': 7.3406612116781e-06, 'epoch': 0.6} {'loss': 0.7314, 'learning_rate': 7.33465524693838e-06, 'epoch': 0.6} {'loss': 0.7651, 'learning_rate': 7.328650316977265e-06, 'epoch': 0.6} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B00WTKH3HC.jpg' {'loss': 0.7886, 'learning_rate': 7.322646424126079e-06, 'epoch': 0.6} {'loss': 0.7773, 'learning_rate': 7.316643570715729e-06, 'epoch': 0.6} {'loss': 0.7778, 'learning_rate': 7.310641759076742e-06, 'epoch': 0.6} {'loss': 0.7661, 'learning_rate': 7.304640991539216e-06, 'epoch': 0.6} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/471148288.jpg' {'loss': 0.7915, 'learning_rate': 7.2986412704328625e-06, 'epoch': 0.6} {'loss': 0.7983, 'learning_rate': 7.292642598086982e-06, 'epoch': 0.6} {'loss': 0.7939, 'learning_rate': 7.286644976830457e-06, 'epoch': 0.6} {'loss': 0.7739, 'learning_rate': 7.280648408991775e-06, 'epoch': 0.6} {'loss': 0.7393, 'learning_rate': 7.274652896899015e-06, 'epoch': 0.6} {'loss': 0.8291, 'learning_rate': 7.268658442879834e-06, 'epoch': 0.6} {'loss': 0.7539, 'learning_rate': 7.262665049261489e-06, 'epoch': 0.6} {'loss': 0.8232, 'learning_rate': 7.256672718370824e-06, 'epoch': 0.6} {'loss': 0.7822, 'learning_rate': 7.250681452534261e-06, 'epoch': 0.6} {'loss': 0.8042, 'learning_rate': 7.2446912540778196e-06, 'epoch': 0.6} {'loss': 0.7471, 'learning_rate': 7.238702125327106e-06, 'epoch': 0.6} {'loss': 0.7876, 'learning_rate': 7.232714068607296e-06, 'epoch': 0.6} {'loss': 0.2489, 'learning_rate': 7.226727086243168e-06, 'epoch': 0.6} {'loss': 0.7705, 'learning_rate': 7.220741180559074e-06, 'epoch': 0.6} {'loss': 0.8057, 'learning_rate': 7.214756353878942e-06, 'epoch': 0.6} {'loss': 0.7397, 'learning_rate': 7.208772608526293e-06, 'epoch': 0.6} {'loss': 0.7778, 'learning_rate': 7.202789946824227e-06, 'epoch': 0.6} {'loss': 0.8115, 'learning_rate': 7.1968083710954075e-06, 'epoch': 0.6} {'loss': 0.7632, 'learning_rate': 7.1908278836621e-06, 'epoch': 0.6} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739705539.jpg' {'loss': 0.7969, 'learning_rate': 7.184848486846128e-06, 'epoch': 0.6} {'loss': 0.8457, 'learning_rate': 7.178870182968904e-06, 'epoch': 0.6} {'loss': 0.8037, 'learning_rate': 7.1728929743514065e-06, 'epoch': 0.6} {'loss': 0.8159, 'learning_rate': 7.166916863314199e-06, 'epoch': 0.6} {'loss': 0.8198, 'learning_rate': 7.1609418521774095e-06, 'epoch': 0.6} {'loss': 0.7422, 'learning_rate': 7.154967943260748e-06, 'epoch': 0.6} {'loss': 0.7505, 'learning_rate': 7.148995138883483e-06, 'epoch': 0.6} {'loss': 0.7534, 'learning_rate': 7.143023441364471e-06, 'epoch': 0.6} {'loss': 0.7583, 'learning_rate': 7.13705285302213e-06, 'epoch': 0.6} {'loss': 0.7944, 'learning_rate': 7.131083376174441e-06, 'epoch': 0.6} {'loss': 0.7798, 'learning_rate': 7.125115013138966e-06, 'epoch': 0.61} {'loss': 0.77, 'learning_rate': 7.119147766232832e-06, 'epoch': 0.61} {'loss': 0.7876, 'learning_rate': 7.113181637772721e-06, 'epoch': 0.61} {'loss': 0.8218, 'learning_rate': 7.107216630074895e-06, 'epoch': 0.61} {'loss': 0.6995, 'learning_rate': 7.1012527454551795e-06, 'epoch': 0.61} {'loss': 0.8374, 'learning_rate': 7.09528998622895e-06, 'epoch': 0.61} {'loss': 0.7705, 'learning_rate': 7.089328354711159e-06, 'epoch': 0.61} {'loss': 0.8096, 'learning_rate': 7.083367853216323e-06, 'epoch': 0.61} {'loss': 0.8105, 'learning_rate': 7.077408484058505e-06, 'epoch': 0.61} {'loss': 0.8066, 'learning_rate': 7.071450249551342e-06, 'epoch': 0.61} {'loss': 0.7876, 'learning_rate': 7.065493152008026e-06, 'epoch': 0.61} {'loss': 0.771, 'learning_rate': 7.059537193741306e-06, 'epoch': 0.61} {'loss': 0.7979, 'learning_rate': 7.053582377063489e-06, 'epoch': 0.61} {'loss': 0.791, 'learning_rate': 7.047628704286446e-06, 'epoch': 0.61} {'loss': 0.7866, 'learning_rate': 7.041676177721588e-06, 'epoch': 0.61} {'loss': 0.8213, 'learning_rate': 7.035724799679898e-06, 'epoch': 0.61} {'loss': 0.7427, 'learning_rate': 7.029774572471904e-06, 'epoch': 0.61} {'loss': 0.7603, 'learning_rate': 7.023825498407689e-06, 'epoch': 0.61} {'loss': 0.8232, 'learning_rate': 7.0178775797968855e-06, 'epoch': 0.61} {'loss': 0.7515, 'learning_rate': 7.011930818948688e-06, 'epoch': 0.61} {'loss': 0.2589, 'learning_rate': 7.005985218171825e-06, 'epoch': 0.61} {'loss': 0.7744, 'learning_rate': 7.000040779774591e-06, 'epoch': 0.61} {'loss': 0.8145, 'learning_rate': 6.994097506064812e-06, 'epoch': 0.61} {'loss': 0.8003, 'learning_rate': 6.9881553993498805e-06, 'epoch': 0.61} {'loss': 0.2467, 'learning_rate': 6.9822144619367275e-06, 'epoch': 0.61} [2024-01-31 08:10:01,758] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7744, 'learning_rate': 6.97627469613182e-06, 'epoch': 0.61} [2024-01-31 08:10:19,482] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7754, 'learning_rate': 6.970336104241186e-06, 'epoch': 0.61} {'loss': 0.7529, 'learning_rate': 6.9643986885703955e-06, 'epoch': 0.61} [2024-01-31 08:10:54,457] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8179, 'learning_rate': 6.958462451424547e-06, 'epoch': 0.61} {'loss': 0.2697, 'learning_rate': 6.952527395108302e-06, 'epoch': 0.61} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1560445513.jpg' [2024-01-31 08:11:30,545] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8213, 'learning_rate': 6.9465935219258504e-06, 'epoch': 0.61} {'loss': 0.7129, 'learning_rate': 6.9406608341809215e-06, 'epoch': 0.61} [2024-01-31 08:12:06,435] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8066, 'learning_rate': 6.934729334176793e-06, 'epoch': 0.61} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/28624084.jpg' {'loss': 0.7822, 'learning_rate': 6.928799024216282e-06, 'epoch': 0.61} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/934710171.jpg' [2024-01-31 08:12:49,165] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7085, 'learning_rate': 6.92286990660173e-06, 'epoch': 0.61} [2024-01-31 08:13:07,504] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7617, 'learning_rate': 6.91694198363503e-06, 'epoch': 0.61} {'loss': 0.7197, 'learning_rate': 6.911015257617606e-06, 'epoch': 0.61} {'loss': 0.7739, 'learning_rate': 6.905089730850416e-06, 'epoch': 0.61} {'loss': 0.7402, 'learning_rate': 6.8991654056339505e-06, 'epoch': 0.61} {'loss': 0.7759, 'learning_rate': 6.893242284268244e-06, 'epoch': 0.61} {'loss': 0.813, 'learning_rate': 6.887320369052848e-06, 'epoch': 0.61} {'loss': 0.8125, 'learning_rate': 6.8813996622868584e-06, 'epoch': 0.61} {'loss': 0.7896, 'learning_rate': 6.8754801662688964e-06, 'epoch': 0.61} {'loss': 0.7559, 'learning_rate': 6.869561883297116e-06, 'epoch': 0.61} {'loss': 0.7803, 'learning_rate': 6.863644815669197e-06, 'epoch': 0.61} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/933478186.jpg' {'loss': 0.8003, 'learning_rate': 6.857728965682344e-06, 'epoch': 0.61} WARNING: tokenization mismatch: 1 vs. 1419. (ignored) WARNING: tokenization mismatch: 1 vs. 737. (ignored) {'loss': 0.8184, 'learning_rate': 6.851814335633298e-06, 'epoch': 0.61} {'loss': 0.7573, 'learning_rate': 6.8459009278183275e-06, 'epoch': 0.61} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/345414810.jpg' {'loss': 0.77, 'learning_rate': 6.839988744533211e-06, 'epoch': 0.61} {'loss': 0.2461, 'learning_rate': 6.834077788073268e-06, 'epoch': 0.61} {'loss': 0.7959, 'learning_rate': 6.828168060733336e-06, 'epoch': 0.61} {'loss': 0.832, 'learning_rate': 6.822259564807768e-06, 'epoch': 0.61} {'loss': 0.8032, 'learning_rate': 6.81635230259045e-06, 'epoch': 0.62} {'loss': 0.792, 'learning_rate': 6.810446276374789e-06, 'epoch': 0.62} {'loss': 0.8037, 'learning_rate': 6.8045414884536975e-06, 'epoch': 0.62} {'loss': 0.2931, 'learning_rate': 6.7986379411196255e-06, 'epoch': 0.62} {'loss': 0.79, 'learning_rate': 6.7927356366645315e-06, 'epoch': 0.62} {'loss': 0.7827, 'learning_rate': 6.786834577379893e-06, 'epoch': 0.62} {'loss': 0.8037, 'learning_rate': 6.780934765556702e-06, 'epoch': 0.62} {'loss': 0.7969, 'learning_rate': 6.775036203485472e-06, 'epoch': 0.62} {'loss': 0.7817, 'learning_rate': 6.769138893456225e-06, 'epoch': 0.62} {'loss': 0.7886, 'learning_rate': 6.763242837758504e-06, 'epoch': 0.62} {'loss': 0.7603, 'learning_rate': 6.757348038681357e-06, 'epoch': 0.62} {'loss': 0.8081, 'learning_rate': 6.751454498513349e-06, 'epoch': 0.62} {'loss': 0.7078, 'learning_rate': 6.745562219542554e-06, 'epoch': 0.62} {'loss': 0.7651, 'learning_rate': 6.7396712040565625e-06, 'epoch': 0.62} [2024-01-31 08:22:42,305] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7393, 'learning_rate': 6.733781454342463e-06, 'epoch': 0.62} {'loss': 0.7656, 'learning_rate': 6.727892972686861e-06, 'epoch': 0.62} {'loss': 0.7607, 'learning_rate': 6.722005761375873e-06, 'epoch': 0.62} {'loss': 0.772, 'learning_rate': 6.716119822695111e-06, 'epoch': 0.62} {'loss': 0.8169, 'learning_rate': 6.710235158929703e-06, 'epoch': 0.62} {'loss': 0.79, 'learning_rate': 6.704351772364274e-06, 'epoch': 0.62} {'loss': 0.8022, 'learning_rate': 6.698469665282958e-06, 'epoch': 0.62} {'loss': 0.7393, 'learning_rate': 6.692588839969397e-06, 'epoch': 0.62} {'loss': 0.7729, 'learning_rate': 6.6867092987067214e-06, 'epoch': 0.62} {'loss': 0.7573, 'learning_rate': 6.680831043777579e-06, 'epoch': 0.62} {'loss': 0.769, 'learning_rate': 6.674954077464108e-06, 'epoch': 0.62} {'loss': 0.772, 'learning_rate': 6.6690784020479484e-06, 'epoch': 0.62} {'loss': 0.7109, 'learning_rate': 6.6632040198102364e-06, 'epoch': 0.62} {'loss': 0.7427, 'learning_rate': 6.657330933031619e-06, 'epoch': 0.62} {'loss': 0.7896, 'learning_rate': 6.651459143992221e-06, 'epoch': 0.62} {'loss': 0.2661, 'learning_rate': 6.645588654971677e-06, 'epoch': 0.62} {'loss': 0.7866, 'learning_rate': 6.639719468249115e-06, 'epoch': 0.62} {'loss': 0.7915, 'learning_rate': 6.633851586103153e-06, 'epoch': 0.62} {'loss': 0.79, 'learning_rate': 6.627985010811903e-06, 'epoch': 0.62} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1559920696.jpg' {'loss': 0.7734, 'learning_rate': 6.622119744652977e-06, 'epoch': 0.62} {'loss': 0.7642, 'learning_rate': 6.616255789903467e-06, 'epoch': 0.62} {'loss': 0.7871, 'learning_rate': 6.610393148839964e-06, 'epoch': 0.62} {'loss': 0.7729, 'learning_rate': 6.6045318237385526e-06, 'epoch': 0.62} {'loss': 0.8286, 'learning_rate': 6.598671816874794e-06, 'epoch': 0.62} {'loss': 0.8428, 'learning_rate': 6.5928131305237465e-06, 'epoch': 0.62} {'loss': 0.7964, 'learning_rate': 6.586955766959958e-06, 'epoch': 0.62} {'loss': 0.814, 'learning_rate': 6.581099728457451e-06, 'epoch': 0.62} {'loss': 0.8066, 'learning_rate': 6.5752450172897466e-06, 'epoch': 0.62} {'loss': 0.7568, 'learning_rate': 6.569391635729847e-06, 'epoch': 0.62} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/870408712.jpg' {'loss': 0.7783, 'learning_rate': 6.563539586050233e-06, 'epoch': 0.62} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739715534.jpg' {'loss': 0.7905, 'learning_rate': 6.557688870522871e-06, 'epoch': 0.62} {'loss': 0.793, 'learning_rate': 6.551839491419213e-06, 'epoch': 0.62} {'loss': 0.7808, 'learning_rate': 6.545991451010185e-06, 'epoch': 0.62} {'loss': 0.7847, 'learning_rate': 6.5401447515662065e-06, 'epoch': 0.62} {'loss': 0.7988, 'learning_rate': 6.5342993953571556e-06, 'epoch': 0.62} {'loss': 0.8188, 'learning_rate': 6.52845538465241e-06, 'epoch': 0.62} {'loss': 0.7427, 'learning_rate': 6.522612721720813e-06, 'epoch': 0.62} {'loss': 0.7495, 'learning_rate': 6.5167714088306865e-06, 'epoch': 0.62} {'loss': 0.811, 'learning_rate': 6.51093144824983e-06, 'epoch': 0.63} {'loss': 0.769, 'learning_rate': 6.505092842245519e-06, 'epoch': 0.63} {'loss': 0.7227, 'learning_rate': 6.499255593084498e-06, 'epoch': 0.63} [2024-01-31 08:35:08,310] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7681, 'learning_rate': 6.493419703032991e-06, 'epoch': 0.63} [2024-01-31 08:35:25,206] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8193, 'learning_rate': 6.487585174356691e-06, 'epoch': 0.63} {'loss': 0.239, 'learning_rate': 6.481752009320761e-06, 'epoch': 0.63} {'loss': 0.8193, 'learning_rate': 6.4759202101898366e-06, 'epoch': 0.63} {'loss': 0.75, 'learning_rate': 6.4700897792280285e-06, 'epoch': 0.63} {'loss': 0.7983, 'learning_rate': 6.464260718698902e-06, 'epoch': 0.63} {'loss': 0.748, 'learning_rate': 6.458433030865503e-06, 'epoch': 0.63} {'loss': 0.7749, 'learning_rate': 6.452606717990346e-06, 'epoch': 0.63} {'loss': 0.8013, 'learning_rate': 6.4467817823354005e-06, 'epoch': 0.63} {'loss': 0.7954, 'learning_rate': 6.440958226162104e-06, 'epoch': 0.63} {'loss': 0.8066, 'learning_rate': 6.43513605173137e-06, 'epoch': 0.63} {'loss': 0.77, 'learning_rate': 6.4293152613035594e-06, 'epoch': 0.63} {'loss': 0.8394, 'learning_rate': 6.4234958571385095e-06, 'epoch': 0.63} {'loss': 0.6873, 'learning_rate': 6.4176778414955075e-06, 'epoch': 0.63} {'loss': 0.8389, 'learning_rate': 6.4118612166333124e-06, 'epoch': 0.63} {'loss': 0.7632, 'learning_rate': 6.4060459848101354e-06, 'epoch': 0.63} {'loss': 0.7529, 'learning_rate': 6.400232148283651e-06, 'epoch': 0.63} {'loss': 0.2933, 'learning_rate': 6.3944197093109885e-06, 'epoch': 0.63} {'loss': 0.7769, 'learning_rate': 6.388608670148741e-06, 'epoch': 0.63} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/3980621154.jpg' {'loss': 0.7505, 'learning_rate': 6.38279903305295e-06, 'epoch': 0.63} {'loss': 0.7798, 'learning_rate': 6.376990800279119e-06, 'epoch': 0.63} {'loss': 0.8071, 'learning_rate': 6.3711839740822035e-06, 'epoch': 0.63} {'loss': 0.751, 'learning_rate': 6.3653785567166125e-06, 'epoch': 0.63} {'loss': 0.7598, 'learning_rate': 6.359574550436209e-06, 'epoch': 0.63} {'loss': 0.7729, 'learning_rate': 6.3537719574943105e-06, 'epoch': 0.63} {'loss': 0.7954, 'learning_rate': 6.347970780143678e-06, 'epoch': 0.63} {'loss': 0.8027, 'learning_rate': 6.342171020636533e-06, 'epoch': 0.63} {'loss': 0.751, 'learning_rate': 6.336372681224543e-06, 'epoch': 0.63} {'loss': 0.8599, 'learning_rate': 6.330575764158819e-06, 'epoch': 0.63} {'loss': 0.73, 'learning_rate': 6.324780271689923e-06, 'epoch': 0.63} {'loss': 0.7988, 'learning_rate': 6.318986206067872e-06, 'epoch': 0.63} {'loss': 0.8315, 'learning_rate': 6.313193569542113e-06, 'epoch': 0.63} {'loss': 0.7598, 'learning_rate': 6.30740236436155e-06, 'epoch': 0.63} {'loss': 0.8047, 'learning_rate': 6.301612592774533e-06, 'epoch': 0.63} {'loss': 0.8188, 'learning_rate': 6.295824257028844e-06, 'epoch': 0.63} {'loss': 0.748, 'learning_rate': 6.290037359371717e-06, 'epoch': 0.63} {'loss': 0.731, 'learning_rate': 6.284251902049827e-06, 'epoch': 0.63} {'loss': 0.7744, 'learning_rate': 6.278467887309283e-06, 'epoch': 0.63} {'loss': 0.79, 'learning_rate': 6.272685317395644e-06, 'epoch': 0.63} {'loss': 0.7881, 'learning_rate': 6.266904194553896e-06, 'epoch': 0.63} {'loss': 0.8081, 'learning_rate': 6.261124521028477e-06, 'epoch': 0.63} {'loss': 0.7021, 'learning_rate': 6.255346299063252e-06, 'epoch': 0.63} {'loss': 0.7642, 'learning_rate': 6.249569530901525e-06, 'epoch': 0.63} {'loss': 0.79, 'learning_rate': 6.243794218786034e-06, 'epoch': 0.63} {'loss': 0.7935, 'learning_rate': 6.238020364958964e-06, 'epoch': 0.63} {'loss': 0.7876, 'learning_rate': 6.232247971661912e-06, 'epoch': 0.63} {'loss': 0.748, 'learning_rate': 6.2264770411359256e-06, 'epoch': 0.63} {'loss': 0.811, 'learning_rate': 6.22070757562148e-06, 'epoch': 0.63} {'loss': 0.8047, 'learning_rate': 6.214939577358479e-06, 'epoch': 0.63} {'loss': 0.811, 'learning_rate': 6.209173048586253e-06, 'epoch': 0.64} {'loss': 0.7607, 'learning_rate': 6.203407991543577e-06, 'epoch': 0.64} {'loss': 0.791, 'learning_rate': 6.197644408468635e-06, 'epoch': 0.64} {'loss': 0.7832, 'learning_rate': 6.191882301599052e-06, 'epoch': 0.64} {'loss': 0.771, 'learning_rate': 6.186121673171882e-06, 'epoch': 0.64} {'loss': 0.7681, 'learning_rate': 6.180362525423591e-06, 'epoch': 0.64} {'loss': 0.7451, 'learning_rate': 6.174604860590081e-06, 'epoch': 0.64} {'loss': 0.7607, 'learning_rate': 6.168848680906678e-06, 'epoch': 0.64} {'loss': 0.8003, 'learning_rate': 6.163093988608127e-06, 'epoch': 0.64} {'loss': 0.7905, 'learning_rate': 6.157340785928595e-06, 'epoch': 0.64} {'loss': 0.7993, 'learning_rate': 6.151589075101681e-06, 'epoch': 0.64} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/393701719.jpg' {'loss': 0.8169, 'learning_rate': 6.145838858360391e-06, 'epoch': 0.64} {'loss': 0.7749, 'learning_rate': 6.140090137937158e-06, 'epoch': 0.64} {'loss': 0.8096, 'learning_rate': 6.134342916063838e-06, 'epoch': 0.64} {'loss': 0.2665, 'learning_rate': 6.128597194971691e-06, 'epoch': 0.64} {'loss': 0.814, 'learning_rate': 6.122852976891413e-06, 'epoch': 0.64} {'loss': 0.7617, 'learning_rate': 6.117110264053101e-06, 'epoch': 0.64} {'loss': 0.7661, 'learning_rate': 6.111369058686276e-06, 'epoch': 0.64} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/28608194.jpg' {'loss': 0.7822, 'learning_rate': 6.105629363019875e-06, 'epoch': 0.64} {'loss': 0.7593, 'learning_rate': 6.099891179282242e-06, 'epoch': 0.64} {'loss': 0.7417, 'learning_rate': 6.094154509701133e-06, 'epoch': 0.64} {'loss': 0.8477, 'learning_rate': 6.088419356503732e-06, 'epoch': 0.64} [2024-01-31 08:56:49,377] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.6963, 'learning_rate': 6.082685721916612e-06, 'epoch': 0.64} [2024-01-31 08:57:07,455] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7529, 'learning_rate': 6.076953608165772e-06, 'epoch': 0.64} {'loss': 0.793, 'learning_rate': 6.07122301747662e-06, 'epoch': 0.64} {'loss': 0.7988, 'learning_rate': 6.065493952073961e-06, 'epoch': 0.64} {'loss': 0.7349, 'learning_rate': 6.0597664141820176e-06, 'epoch': 0.64} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/553353500.jpg' {'loss': 0.791, 'learning_rate': 6.054040406024422e-06, 'epoch': 0.64} {'loss': 0.8477, 'learning_rate': 6.0483159298242e-06, 'epoch': 0.64} {'loss': 0.7388, 'learning_rate': 6.042592987803796e-06, 'epoch': 0.64} {'loss': 0.8291, 'learning_rate': 6.036871582185054e-06, 'epoch': 0.64} {'loss': 0.709, 'learning_rate': 6.031151715189217e-06, 'epoch': 0.64} {'loss': 0.7783, 'learning_rate': 6.025433389036935e-06, 'epoch': 0.64} {'loss': 0.7588, 'learning_rate': 6.019716605948261e-06, 'epoch': 0.64} {'loss': 0.7705, 'learning_rate': 6.014001368142643e-06, 'epoch': 0.64} {'loss': 0.7534, 'learning_rate': 6.008287677838937e-06, 'epoch': 0.64} {'loss': 0.7222, 'learning_rate': 6.002575537255395e-06, 'epoch': 0.64} {'loss': 0.7803, 'learning_rate': 5.996864948609662e-06, 'epoch': 0.64} {'loss': 0.7134, 'learning_rate': 5.9911559141187924e-06, 'epoch': 0.64} {'loss': 0.2648, 'learning_rate': 5.9854484359992235e-06, 'epoch': 0.64} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/715308904.jpg' {'loss': 0.7588, 'learning_rate': 5.979742516466793e-06, 'epoch': 0.64} {'loss': 0.7612, 'learning_rate': 5.974038157736746e-06, 'epoch': 0.64} {'loss': 0.8159, 'learning_rate': 5.968335362023697e-06, 'epoch': 0.64} {'loss': 0.7632, 'learning_rate': 5.962634131541676e-06, 'epoch': 0.64} {'loss': 0.769, 'learning_rate': 5.956934468504101e-06, 'epoch': 0.64} {'loss': 0.7568, 'learning_rate': 5.951236375123768e-06, 'epoch': 0.64} {'loss': 0.2635, 'learning_rate': 5.945539853612876e-06, 'epoch': 0.64} {'loss': 0.6934, 'learning_rate': 5.939844906183016e-06, 'epoch': 0.64} {'loss': 0.7773, 'learning_rate': 5.934151535045156e-06, 'epoch': 0.64} {'loss': 0.8311, 'learning_rate': 5.92845974240966e-06, 'epoch': 0.64} {'loss': 0.7651, 'learning_rate': 5.922769530486283e-06, 'epoch': 0.64} {'loss': 0.7393, 'learning_rate': 5.917080901484156e-06, 'epoch': 0.64} {'loss': 0.7876, 'learning_rate': 5.9113938576118e-06, 'epoch': 0.65} {'loss': 0.751, 'learning_rate': 5.905708401077128e-06, 'epoch': 0.65} {'loss': 0.7817, 'learning_rate': 5.900024534087421e-06, 'epoch': 0.65} {'loss': 0.8042, 'learning_rate': 5.894342258849355e-06, 'epoch': 0.65} {'loss': 0.7495, 'learning_rate': 5.88866157756899e-06, 'epoch': 0.65} {'loss': 0.793, 'learning_rate': 5.882982492451757e-06, 'epoch': 0.65} {'loss': 0.2579, 'learning_rate': 5.877305005702471e-06, 'epoch': 0.65} {'loss': 0.7749, 'learning_rate': 5.871629119525335e-06, 'epoch': 0.65} {'loss': 0.7178, 'learning_rate': 5.865954836123915e-06, 'epoch': 0.65} {'loss': 0.8013, 'learning_rate': 5.860282157701167e-06, 'epoch': 0.65} {'loss': 0.7969, 'learning_rate': 5.854611086459423e-06, 'epoch': 0.65} {'loss': 0.792, 'learning_rate': 5.8489416246003814e-06, 'epoch': 0.65} {'loss': 0.7852, 'learning_rate': 5.8432737743251315e-06, 'epoch': 0.65} {'loss': 0.2745, 'learning_rate': 5.8376075378341194e-06, 'epoch': 0.65} {'loss': 0.7974, 'learning_rate': 5.831942917327172e-06, 'epoch': 0.65} {'loss': 0.7842, 'learning_rate': 5.826279915003503e-06, 'epoch': 0.65} {'loss': 0.8076, 'learning_rate': 5.8206185330616725e-06, 'epoch': 0.65} {'loss': 0.7935, 'learning_rate': 5.814958773699625e-06, 'epoch': 0.65} {'loss': 0.2802, 'learning_rate': 5.809300639114683e-06, 'epoch': 0.65} {'loss': 0.7739, 'learning_rate': 5.803644131503516e-06, 'epoch': 0.65} {'loss': 0.79, 'learning_rate': 5.797989253062186e-06, 'epoch': 0.65} {'loss': 0.8311, 'learning_rate': 5.792336005986105e-06, 'epoch': 0.65} {'loss': 0.8252, 'learning_rate': 5.786684392470064e-06, 'epoch': 0.65} {'loss': 0.7822, 'learning_rate': 5.781034414708208e-06, 'epoch': 0.65} {'loss': 0.8013, 'learning_rate': 5.775386074894058e-06, 'epoch': 0.65} {'loss': 0.2922, 'learning_rate': 5.769739375220489e-06, 'epoch': 0.65} {'loss': 0.77, 'learning_rate': 5.7640943178797445e-06, 'epoch': 0.65} {'loss': 0.7783, 'learning_rate': 5.7584509050634395e-06, 'epoch': 0.65} {'loss': 0.7207, 'learning_rate': 5.752809138962525e-06, 'epoch': 0.65} {'loss': 0.8047, 'learning_rate': 5.747169021767342e-06, 'epoch': 0.65} [2024-01-31 09:15:07,153] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7715, 'learning_rate': 5.7415305556675805e-06, 'epoch': 0.65} [2024-01-31 09:15:25,596] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7871, 'learning_rate': 5.73589374285227e-06, 'epoch': 0.65} {'loss': 0.833, 'learning_rate': 5.730258585509832e-06, 'epoch': 0.65} {'loss': 0.7964, 'learning_rate': 5.724625085828022e-06, 'epoch': 0.65} {'loss': 0.791, 'learning_rate': 5.718993245993958e-06, 'epoch': 0.65} {'loss': 0.7686, 'learning_rate': 5.713363068194115e-06, 'epoch': 0.65} {'loss': 0.8271, 'learning_rate': 5.7077345546143235e-06, 'epoch': 0.65} {'loss': 0.8311, 'learning_rate': 5.702107707439766e-06, 'epoch': 0.65} {'loss': 0.7256, 'learning_rate': 5.6964825288549745e-06, 'epoch': 0.65} {'loss': 0.7178, 'learning_rate': 5.690859021043842e-06, 'epoch': 0.65} {'loss': 0.7554, 'learning_rate': 5.685237186189601e-06, 'epoch': 0.65} {'loss': 0.7266, 'learning_rate': 5.679617026474853e-06, 'epoch': 0.65} {'loss': 0.7739, 'learning_rate': 5.673998544081527e-06, 'epoch': 0.65} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1566864941.jpg' {'loss': 0.7656, 'learning_rate': 5.6683817411909114e-06, 'epoch': 0.65} {'loss': 0.2726, 'learning_rate': 5.662766619983653e-06, 'epoch': 0.65} {'loss': 0.79, 'learning_rate': 5.65715318263972e-06, 'epoch': 0.65} {'loss': 0.2704, 'learning_rate': 5.651541431338454e-06, 'epoch': 0.65} {'loss': 0.8428, 'learning_rate': 5.645931368258527e-06, 'epoch': 0.65} {'loss': 0.7368, 'learning_rate': 5.640322995577958e-06, 'epoch': 0.65} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/345351452.jpg' {'loss': 0.7852, 'learning_rate': 5.634716315474109e-06, 'epoch': 0.65} {'loss': 0.7939, 'learning_rate': 5.629111330123689e-06, 'epoch': 0.65} {'loss': 0.7646, 'learning_rate': 5.623508041702743e-06, 'epoch': 0.65} {'loss': 0.8071, 'learning_rate': 5.617906452386659e-06, 'epoch': 0.66} {'loss': 0.2887, 'learning_rate': 5.612306564350179e-06, 'epoch': 0.66} {'loss': 0.243, 'learning_rate': 5.6067083797673535e-06, 'epoch': 0.66} {'loss': 0.7837, 'learning_rate': 5.601111900811607e-06, 'epoch': 0.66} {'loss': 0.2366, 'learning_rate': 5.595517129655681e-06, 'epoch': 0.66} {'loss': 0.7271, 'learning_rate': 5.589924068471648e-06, 'epoch': 0.66} {'loss': 0.7339, 'learning_rate': 5.58433271943094e-06, 'epoch': 0.66} {'loss': 0.8013, 'learning_rate': 5.578743084704306e-06, 'epoch': 0.66} {'loss': 0.7324, 'learning_rate': 5.573155166461833e-06, 'epoch': 0.66} {'loss': 0.7749, 'learning_rate': 5.567568966872947e-06, 'epoch': 0.66} {'loss': 0.7275, 'learning_rate': 5.5619844881064e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/809235269.jpg' {'loss': 0.8062, 'learning_rate': 5.556401732330281e-06, 'epoch': 0.66} {'loss': 0.8027, 'learning_rate': 5.550820701712007e-06, 'epoch': 0.66} {'loss': 0.6646, 'learning_rate': 5.545241398418326e-06, 'epoch': 0.66} {'loss': 0.7402, 'learning_rate': 5.539663824615312e-06, 'epoch': 0.66} {'loss': 0.7358, 'learning_rate': 5.534087982468384e-06, 'epoch': 0.66} {'loss': 0.6948, 'learning_rate': 5.5285138741422615e-06, 'epoch': 0.66} {'loss': 0.75, 'learning_rate': 5.522941501801008e-06, 'epoch': 0.66} {'loss': 0.2468, 'learning_rate': 5.517370867608021e-06, 'epoch': 0.66} {'loss': 0.814, 'learning_rate': 5.511801973725997e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/761307842.jpg' {'loss': 0.7388, 'learning_rate': 5.506234822316983e-06, 'epoch': 0.66} {'loss': 0.6855, 'learning_rate': 5.500669415542336e-06, 'epoch': 0.66} {'loss': 0.7549, 'learning_rate': 5.495105755562738e-06, 'epoch': 0.66} {'loss': 0.7065, 'learning_rate': 5.4895438445381945e-06, 'epoch': 0.66} {'loss': 0.772, 'learning_rate': 5.48398368462803e-06, 'epoch': 0.66} {'loss': 0.2856, 'learning_rate': 5.4784252779908905e-06, 'epoch': 0.66} {'loss': 0.7808, 'learning_rate': 5.4728686267847354e-06, 'epoch': 0.66} {'loss': 0.7524, 'learning_rate': 5.467313733166863e-06, 'epoch': 0.66} {'loss': 0.7773, 'learning_rate': 5.461760599293855e-06, 'epoch': 0.66} {'loss': 0.8066, 'learning_rate': 5.456209227321643e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/892390263.jpg' {'loss': 0.8242, 'learning_rate': 5.450659619405458e-06, 'epoch': 0.66} {'loss': 0.8345, 'learning_rate': 5.445111777699842e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/391040952.jpg' {'loss': 0.834, 'learning_rate': 5.439565704358667e-06, 'epoch': 0.66} {'loss': 0.8286, 'learning_rate': 5.434021401535105e-06, 'epoch': 0.66} {'loss': 0.75, 'learning_rate': 5.428478871381646e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/670886939.jpg' {'loss': 0.7773, 'learning_rate': 5.422938116050092e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/930031571.jpg' {'loss': 0.8159, 'learning_rate': 5.417399137691552e-06, 'epoch': 0.66} {'loss': 0.772, 'learning_rate': 5.411861938456453e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/3980621146.jpg' {'loss': 0.7676, 'learning_rate': 5.406326520494522e-06, 'epoch': 0.66} {'loss': 0.8608, 'learning_rate': 5.400792885954802e-06, 'epoch': 0.66} [2024-01-31 09:34:14,682] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7573, 'learning_rate': 5.395261036985635e-06, 'epoch': 0.66} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/962289027.jpg' {'loss': 0.7222, 'learning_rate': 5.389730975734686e-06, 'epoch': 0.66} {'loss': 0.7012, 'learning_rate': 5.384202704348902e-06, 'epoch': 0.66} {'loss': 0.7881, 'learning_rate': 5.378676224974557e-06, 'epoch': 0.66} {'loss': 0.7314, 'learning_rate': 5.373151539757224e-06, 'epoch': 0.66} {'loss': 0.7949, 'learning_rate': 5.367628650841761e-06, 'epoch': 0.66} {'loss': 0.7354, 'learning_rate': 5.362107560372358e-06, 'epoch': 0.66} {'loss': 0.8096, 'learning_rate': 5.356588270492487e-06, 'epoch': 0.66} {'loss': 0.6953, 'learning_rate': 5.351070783344926e-06, 'epoch': 0.66} {'loss': 0.7578, 'learning_rate': 5.3455551010717545e-06, 'epoch': 0.66} {'loss': 0.7705, 'learning_rate': 5.34004122581435e-06, 'epoch': 0.66} {'loss': 0.8052, 'learning_rate': 5.334529159713389e-06, 'epoch': 0.66} {'loss': 0.7402, 'learning_rate': 5.329018904908841e-06, 'epoch': 0.67} {'loss': 0.7915, 'learning_rate': 5.323510463539989e-06, 'epoch': 0.67} {'loss': 0.7554, 'learning_rate': 5.318003837745382e-06, 'epoch': 0.67} {'loss': 0.7642, 'learning_rate': 5.3124990296628974e-06, 'epoch': 0.67} {'loss': 0.7734, 'learning_rate': 5.306996041429688e-06, 'epoch': 0.67} {'loss': 0.7075, 'learning_rate': 5.301494875182192e-06, 'epoch': 0.67} {'loss': 0.7847, 'learning_rate': 5.295995533056162e-06, 'epoch': 0.67} {'loss': 0.7886, 'learning_rate': 5.290498017186631e-06, 'epoch': 0.67} {'loss': 0.8091, 'learning_rate': 5.2850023297079235e-06, 'epoch': 0.67} {'loss': 0.7769, 'learning_rate': 5.279508472753654e-06, 'epoch': 0.67} {'loss': 0.7788, 'learning_rate': 5.274016448456725e-06, 'epoch': 0.67} {'loss': 0.2549, 'learning_rate': 5.2685262589493314e-06, 'epoch': 0.67} {'loss': 0.8008, 'learning_rate': 5.263037906362953e-06, 'epoch': 0.67} {'loss': 0.7778, 'learning_rate': 5.257551392828359e-06, 'epoch': 0.67} {'loss': 0.8076, 'learning_rate': 5.252066720475597e-06, 'epoch': 0.67} [2024-01-31 09:42:33,207] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7905, 'learning_rate': 5.246583891434018e-06, 'epoch': 0.67} [2024-01-31 09:42:52,698] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7393, 'learning_rate': 5.241102907832232e-06, 'epoch': 0.67} {'loss': 0.7646, 'learning_rate': 5.235623771798151e-06, 'epoch': 0.67} {'loss': 0.2549, 'learning_rate': 5.23014648545897e-06, 'epoch': 0.67} {'loss': 0.7275, 'learning_rate': 5.224671050941146e-06, 'epoch': 0.67} {'loss': 0.7788, 'learning_rate': 5.2191974703704425e-06, 'epoch': 0.67} {'loss': 0.7339, 'learning_rate': 5.213725745871889e-06, 'epoch': 0.67} {'loss': 0.7593, 'learning_rate': 5.208255879569799e-06, 'epoch': 0.67} {'loss': 0.7632, 'learning_rate': 5.20278787358776e-06, 'epoch': 0.67} {'loss': 0.7759, 'learning_rate': 5.197321730048641e-06, 'epoch': 0.67} {'loss': 0.7207, 'learning_rate': 5.1918574510745865e-06, 'epoch': 0.67} {'loss': 0.8008, 'learning_rate': 5.186395038787017e-06, 'epoch': 0.67} {'loss': 0.7593, 'learning_rate': 5.180934495306638e-06, 'epoch': 0.67} {'loss': 0.7791, 'learning_rate': 5.175475822753404e-06, 'epoch': 0.67} {'loss': 0.8105, 'learning_rate': 5.170019023246574e-06, 'epoch': 0.67} {'loss': 0.8135, 'learning_rate': 5.16456409890466e-06, 'epoch': 0.67} {'loss': 0.7241, 'learning_rate': 5.159111051845451e-06, 'epoch': 0.67} {'loss': 0.7534, 'learning_rate': 5.153659884186013e-06, 'epoch': 0.67} {'loss': 0.7578, 'learning_rate': 5.148210598042665e-06, 'epoch': 0.67} {'loss': 0.7715, 'learning_rate': 5.142763195531017e-06, 'epoch': 0.67} {'loss': 0.7151, 'learning_rate': 5.137317678765939e-06, 'epoch': 0.67} {'loss': 0.8291, 'learning_rate': 5.131874049861563e-06, 'epoch': 0.67} {'loss': 0.8013, 'learning_rate': 5.126432310931295e-06, 'epoch': 0.67} {'loss': 0.772, 'learning_rate': 5.120992464087807e-06, 'epoch': 0.67} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/750223391.jpg' {'loss': 0.7192, 'learning_rate': 5.115554511443033e-06, 'epoch': 0.67} {'loss': 0.7803, 'learning_rate': 5.1101184551081705e-06, 'epoch': 0.67} {'loss': 0.7671, 'learning_rate': 5.104684297193694e-06, 'epoch': 0.67} {'loss': 0.7847, 'learning_rate': 5.099252039809317e-06, 'epoch': 0.67} {'loss': 0.8076, 'learning_rate': 5.09382168506404e-06, 'epoch': 0.67} {'loss': 0.7988, 'learning_rate': 5.088393235066114e-06, 'epoch': 0.67} [2024-01-31 09:51:47,739] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7949, 'learning_rate': 5.082966691923037e-06, 'epoch': 0.67} [2024-01-31 09:52:16,130] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7119, 'learning_rate': 5.077542057741592e-06, 'epoch': 0.67} {'loss': 0.7822, 'learning_rate': 5.0721193346278066e-06, 'epoch': 0.67} {'loss': 0.7354, 'learning_rate': 5.066698524686966e-06, 'epoch': 0.67} {'loss': 0.8257, 'learning_rate': 5.061279630023618e-06, 'epoch': 0.67} {'loss': 0.7671, 'learning_rate': 5.055862652741562e-06, 'epoch': 0.67} {'loss': 0.7617, 'learning_rate': 5.050447594943856e-06, 'epoch': 0.67} {'loss': 0.7808, 'learning_rate': 5.045034458732808e-06, 'epoch': 0.68} {'loss': 0.7339, 'learning_rate': 5.0396232462099945e-06, 'epoch': 0.68} {'loss': 0.7769, 'learning_rate': 5.034213959476222e-06, 'epoch': 0.68} {'loss': 0.7524, 'learning_rate': 5.028806600631569e-06, 'epoch': 0.68} {'loss': 0.27, 'learning_rate': 5.023401171775357e-06, 'epoch': 0.68} {'loss': 0.7607, 'learning_rate': 5.017997675006161e-06, 'epoch': 0.68} {'loss': 0.7637, 'learning_rate': 5.012596112421806e-06, 'epoch': 0.68} {'loss': 0.8379, 'learning_rate': 5.007196486119355e-06, 'epoch': 0.68} {'loss': 0.7236, 'learning_rate': 5.001798798195136e-06, 'epoch': 0.68} {'loss': 0.791, 'learning_rate': 4.996403050744719e-06, 'epoch': 0.68} {'loss': 0.7656, 'learning_rate': 4.991009245862917e-06, 'epoch': 0.68} {'loss': 0.7427, 'learning_rate': 4.985617385643789e-06, 'epoch': 0.68} {'loss': 0.7461, 'learning_rate': 4.980227472180643e-06, 'epoch': 0.68} {'loss': 0.7119, 'learning_rate': 4.974839507566027e-06, 'epoch': 0.68} {'loss': 0.2292, 'learning_rate': 4.969453493891733e-06, 'epoch': 0.68} {'loss': 0.8594, 'learning_rate': 4.9640694332488075e-06, 'epoch': 0.68} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1885928017.jpg' {'loss': 0.8315, 'learning_rate': 4.958687327727511e-06, 'epoch': 0.68} {'loss': 0.7373, 'learning_rate': 4.953307179417376e-06, 'epoch': 0.68} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B01577TUTC.jpg' {'loss': 0.7617, 'learning_rate': 4.947928990407156e-06, 'epoch': 0.68} {'loss': 0.7578, 'learning_rate': 4.94255276278485e-06, 'epoch': 0.68} {'loss': 0.7793, 'learning_rate': 4.937178498637696e-06, 'epoch': 0.68} [2024-01-31 10:00:51,544] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7021, 'learning_rate': 4.931806200052165e-06, 'epoch': 0.68} [2024-01-31 10:01:10,619] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7051, 'learning_rate': 4.926435869113971e-06, 'epoch': 0.68} [2024-01-31 10:01:32,832] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7651, 'learning_rate': 4.92106750790806e-06, 'epoch': 0.68} {'loss': 0.7515, 'learning_rate': 4.915701118518616e-06, 'epoch': 0.68} {'loss': 0.7603, 'learning_rate': 4.910336703029055e-06, 'epoch': 0.68} {'loss': 0.2528, 'learning_rate': 4.904974263522025e-06, 'epoch': 0.68} {'loss': 0.7642, 'learning_rate': 4.899613802079419e-06, 'epoch': 0.68} {'loss': 0.6978, 'learning_rate': 4.8942553207823395e-06, 'epoch': 0.68} {'loss': 0.7583, 'learning_rate': 4.888898821711144e-06, 'epoch': 0.68} {'loss': 0.7842, 'learning_rate': 4.883544306945407e-06, 'epoch': 0.68} {'loss': 0.7363, 'learning_rate': 4.878191778563934e-06, 'epoch': 0.68} {'loss': 0.7798, 'learning_rate': 4.872841238644766e-06, 'epoch': 0.68} [2024-01-31 10:04:43,640] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7695, 'learning_rate': 4.867492689265154e-06, 'epoch': 0.68} [2024-01-31 10:05:06,713] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.6865, 'learning_rate': 4.8621461325016015e-06, 'epoch': 0.68} {'loss': 0.2377, 'learning_rate': 4.856801570429822e-06, 'epoch': 0.68} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1557488789.jpg' {'loss': 0.7715, 'learning_rate': 4.851459005124759e-06, 'epoch': 0.68} {'loss': 0.7656, 'learning_rate': 4.846118438660578e-06, 'epoch': 0.68} {'loss': 0.7378, 'learning_rate': 4.840779873110675e-06, 'epoch': 0.68} {'loss': 0.2805, 'learning_rate': 4.83544331054766e-06, 'epoch': 0.68} {'loss': 0.7417, 'learning_rate': 4.83010875304337e-06, 'epoch': 0.68} {'loss': 0.7524, 'learning_rate': 4.824776202668875e-06, 'epoch': 0.68} {'loss': 0.7832, 'learning_rate': 4.819445661494437e-06, 'epoch': 0.68} {'loss': 0.6973, 'learning_rate': 4.8141171315895694e-06, 'epoch': 0.68} {'loss': 0.8052, 'learning_rate': 4.808790615022987e-06, 'epoch': 0.68} {'loss': 0.7092, 'learning_rate': 4.803466113862626e-06, 'epoch': 0.68} {'loss': 0.8096, 'learning_rate': 4.798143630175642e-06, 'epoch': 0.68} {'loss': 0.8003, 'learning_rate': 4.792823166028405e-06, 'epoch': 0.68} {'loss': 0.8086, 'learning_rate': 4.787504723486505e-06, 'epoch': 0.68} {'loss': 0.7681, 'learning_rate': 4.7821883046147414e-06, 'epoch': 0.68} {'loss': 0.7593, 'learning_rate': 4.776873911477133e-06, 'epoch': 0.68} {'loss': 0.7549, 'learning_rate': 4.771561546136908e-06, 'epoch': 0.68} {'loss': 0.7974, 'learning_rate': 4.766251210656509e-06, 'epoch': 0.69} {'loss': 0.7461, 'learning_rate': 4.760942907097601e-06, 'epoch': 0.69} {'loss': 0.7651, 'learning_rate': 4.755636637521035e-06, 'epoch': 0.69} [2024-01-31 10:11:32,893] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7271, 'learning_rate': 4.750332403986902e-06, 'epoch': 0.69} {'loss': 0.7974, 'learning_rate': 4.7450302085544735e-06, 'epoch': 0.69} [2024-01-31 10:12:08,538] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7324, 'learning_rate': 4.739730053282255e-06, 'epoch': 0.69} {'loss': 0.7441, 'learning_rate': 4.734431940227951e-06, 'epoch': 0.69} [2024-01-31 10:12:45,470] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7896, 'learning_rate': 4.7291358714484594e-06, 'epoch': 0.69} [2024-01-31 10:13:04,043] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7539, 'learning_rate': 4.723841848999907e-06, 'epoch': 0.69} [2024-01-31 10:13:24,655] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7935, 'learning_rate': 4.718549874937612e-06, 'epoch': 0.69} {'loss': 0.811, 'learning_rate': 4.713259951316103e-06, 'epoch': 0.69} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/679441662.jpg' {'loss': 0.7578, 'learning_rate': 4.707972080189106e-06, 'epoch': 0.69} {'loss': 0.2488, 'learning_rate': 4.702686263609559e-06, 'epoch': 0.69} {'loss': 0.7466, 'learning_rate': 4.697402503629596e-06, 'epoch': 0.69} {'loss': 0.7905, 'learning_rate': 4.69212080230055e-06, 'epoch': 0.69} {'loss': 0.7085, 'learning_rate': 4.686841161672974e-06, 'epoch': 0.69} {'loss': 0.7671, 'learning_rate': 4.681563583796587e-06, 'epoch': 0.69} {'loss': 0.7632, 'learning_rate': 4.67628807072034e-06, 'epoch': 0.69} {'loss': 0.7139, 'learning_rate': 4.6710146244923645e-06, 'epoch': 0.69} {'loss': 0.7441, 'learning_rate': 4.665743247159995e-06, 'epoch': 0.69} {'loss': 0.7427, 'learning_rate': 4.660473940769761e-06, 'epoch': 0.69} {'loss': 0.8398, 'learning_rate': 4.655206707367388e-06, 'epoch': 0.69} {'loss': 0.7275, 'learning_rate': 4.649941548997797e-06, 'epoch': 0.69} {'loss': 0.7607, 'learning_rate': 4.644678467705101e-06, 'epoch': 0.69} {'loss': 0.7744, 'learning_rate': 4.639417465532622e-06, 'epoch': 0.69} {'loss': 0.2498, 'learning_rate': 4.634158544522849e-06, 'epoch': 0.69} {'loss': 0.7202, 'learning_rate': 4.628901706717476e-06, 'epoch': 0.69} {'loss': 0.7622, 'learning_rate': 4.623646954157399e-06, 'epoch': 0.69} {'loss': 0.2528, 'learning_rate': 4.618394288882681e-06, 'epoch': 0.69} {'loss': 0.7783, 'learning_rate': 4.613143712932603e-06, 'epoch': 0.69} {'loss': 0.7959, 'learning_rate': 4.607895228345603e-06, 'epoch': 0.69} {'loss': 0.813, 'learning_rate': 4.602648837159333e-06, 'epoch': 0.69} {'loss': 0.7798, 'learning_rate': 4.597404541410622e-06, 'epoch': 0.69} {'loss': 0.7388, 'learning_rate': 4.592162343135483e-06, 'epoch': 0.69} {'loss': 0.8013, 'learning_rate': 4.586922244369122e-06, 'epoch': 0.69} {'loss': 0.7603, 'learning_rate': 4.5816842471459224e-06, 'epoch': 0.69} {'loss': 0.6868, 'learning_rate': 4.576448353499457e-06, 'epoch': 0.69} {'loss': 0.7739, 'learning_rate': 4.571214565462477e-06, 'epoch': 0.69} {'loss': 0.77, 'learning_rate': 4.565982885066923e-06, 'epoch': 0.69} {'loss': 0.7437, 'learning_rate': 4.560753314343912e-06, 'epoch': 0.69} {'loss': 0.7441, 'learning_rate': 4.555525855323738e-06, 'epoch': 0.69} {'loss': 0.2793, 'learning_rate': 4.5503005100358945e-06, 'epoch': 0.69} {'loss': 0.8281, 'learning_rate': 4.545077280509022e-06, 'epoch': 0.69} {'loss': 0.8125, 'learning_rate': 4.539856168770974e-06, 'epoch': 0.69} {'loss': 0.7798, 'learning_rate': 4.534637176848758e-06, 'epoch': 0.69} {'loss': 0.7456, 'learning_rate': 4.52942030676857e-06, 'epoch': 0.69} {'loss': 0.7637, 'learning_rate': 4.524205560555774e-06, 'epoch': 0.69} {'loss': 0.7598, 'learning_rate': 4.5189929402349175e-06, 'epoch': 0.69} {'loss': 0.7417, 'learning_rate': 4.513782447829717e-06, 'epoch': 0.69} {'loss': 0.7549, 'learning_rate': 4.508574085363065e-06, 'epoch': 0.69} {'loss': 0.791, 'learning_rate': 4.503367854857035e-06, 'epoch': 0.69} {'loss': 0.7915, 'learning_rate': 4.498163758332853e-06, 'epoch': 0.69} {'loss': 0.7881, 'learning_rate': 4.492961797810932e-06, 'epoch': 0.7} {'loss': 0.7646, 'learning_rate': 4.4877619753108605e-06, 'epoch': 0.7} [2024-01-31 10:27:07,404] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7642, 'learning_rate': 4.4825642928513746e-06, 'epoch': 0.7} [2024-01-31 10:27:26,178] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8066, 'learning_rate': 4.477368752450409e-06, 'epoch': 0.7} {'loss': 0.7637, 'learning_rate': 4.472175356125036e-06, 'epoch': 0.7} [2024-01-31 10:28:03,334] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7842, 'learning_rate': 4.466984105891521e-06, 'epoch': 0.7} [2024-01-31 10:28:26,908] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7388, 'learning_rate': 4.461795003765285e-06, 'epoch': 0.7} {'loss': 0.2808, 'learning_rate': 4.456608051760914e-06, 'epoch': 0.7} {'loss': 0.793, 'learning_rate': 4.45142325189216e-06, 'epoch': 0.7} {'loss': 0.8062, 'learning_rate': 4.446240606171945e-06, 'epoch': 0.7} {'loss': 0.7524, 'learning_rate': 4.4410601166123475e-06, 'epoch': 0.7} {'loss': 0.7822, 'learning_rate': 4.4358817852246124e-06, 'epoch': 0.7} {'loss': 0.8242, 'learning_rate': 4.430705614019147e-06, 'epoch': 0.7} {'loss': 0.7324, 'learning_rate': 4.425531605005519e-06, 'epoch': 0.7} {'loss': 0.7139, 'learning_rate': 4.420359760192452e-06, 'epoch': 0.7} {'loss': 0.2341, 'learning_rate': 4.4151900815878455e-06, 'epoch': 0.7} [2024-01-31 10:31:42,130] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7798, 'learning_rate': 4.410022571198734e-06, 'epoch': 0.7} [2024-01-31 10:32:02,818] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.6694, 'learning_rate': 4.404857231031332e-06, 'epoch': 0.7} {'loss': 0.7563, 'learning_rate': 4.399694063090999e-06, 'epoch': 0.7} {'loss': 0.8013, 'learning_rate': 4.394533069382255e-06, 'epoch': 0.7} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/038794740X.jpg' {'loss': 0.7451, 'learning_rate': 4.3893742519087754e-06, 'epoch': 0.7} {'loss': 0.2991, 'learning_rate': 4.3842176126733914e-06, 'epoch': 0.7} {'loss': 0.8247, 'learning_rate': 4.379063153678087e-06, 'epoch': 0.7} {'loss': 0.7886, 'learning_rate': 4.373910876923997e-06, 'epoch': 0.7} {'loss': 0.73, 'learning_rate': 4.368760784411423e-06, 'epoch': 0.7} {'loss': 0.8115, 'learning_rate': 4.363612878139799e-06, 'epoch': 0.7} {'loss': 0.7783, 'learning_rate': 4.3584671601077224e-06, 'epoch': 0.7} {'loss': 0.7393, 'learning_rate': 4.353323632312938e-06, 'epoch': 0.7} [2024-01-31 10:35:32,659] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7046, 'learning_rate': 4.348182296752336e-06, 'epoch': 0.7} [2024-01-31 10:35:53,194] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.731, 'learning_rate': 4.343043155421971e-06, 'epoch': 0.7} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/960536205.jpg' {'loss': 0.7505, 'learning_rate': 4.3379062103170214e-06, 'epoch': 0.7} {'loss': 0.7856, 'learning_rate': 4.332771463431837e-06, 'epoch': 0.7} {'loss': 0.769, 'learning_rate': 4.327638916759898e-06, 'epoch': 0.7} {'loss': 0.7529, 'learning_rate': 4.322508572293836e-06, 'epoch': 0.7} {'loss': 0.7861, 'learning_rate': 4.317380432025428e-06, 'epoch': 0.7} {'loss': 0.7646, 'learning_rate': 4.312254497945595e-06, 'epoch': 0.7} {'loss': 0.771, 'learning_rate': 4.3071307720444015e-06, 'epoch': 0.7} {'loss': 0.7329, 'learning_rate': 4.3020092563110485e-06, 'epoch': 0.7} {'loss': 0.8354, 'learning_rate': 4.2968899527338984e-06, 'epoch': 0.7} {'loss': 0.7129, 'learning_rate': 4.291772863300428e-06, 'epoch': 0.7} {'loss': 0.7437, 'learning_rate': 4.2866579899972686e-06, 'epoch': 0.7} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/521256771.jpg' {'loss': 0.7432, 'learning_rate': 4.281545334810201e-06, 'epoch': 0.7} {'loss': 0.2803, 'learning_rate': 4.276434899724119e-06, 'epoch': 0.7} {'loss': 0.7939, 'learning_rate': 4.27132668672308e-06, 'epoch': 0.7} {'loss': 0.7476, 'learning_rate': 4.266220697790266e-06, 'epoch': 0.7} {'loss': 0.7563, 'learning_rate': 4.2611169349079985e-06, 'epoch': 0.7} {'loss': 0.7783, 'learning_rate': 4.25601540005773e-06, 'epoch': 0.7} {'loss': 0.7402, 'learning_rate': 4.250916095220056e-06, 'epoch': 0.7} {'loss': 0.2701, 'learning_rate': 4.2458190223747e-06, 'epoch': 0.7} {'loss': 0.7622, 'learning_rate': 4.240724183500518e-06, 'epoch': 0.7} {'loss': 0.7456, 'learning_rate': 4.2356315805755135e-06, 'epoch': 0.7} {'loss': 0.7734, 'learning_rate': 4.230541215576798e-06, 'epoch': 0.7} {'loss': 0.7441, 'learning_rate': 4.225453090480631e-06, 'epoch': 0.71} {'loss': 0.73, 'learning_rate': 4.220367207262398e-06, 'epoch': 0.71} {'loss': 0.7129, 'learning_rate': 4.21528356789661e-06, 'epoch': 0.71} [2024-01-31 10:44:00,307] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.813, 'learning_rate': 4.210202174356922e-06, 'epoch': 0.71} [2024-01-31 10:44:20,412] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7749, 'learning_rate': 4.20512302861609e-06, 'epoch': 0.71} [2024-01-31 10:44:40,419] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7397, 'learning_rate': 4.2000461326460274e-06, 'epoch': 0.71} {'loss': 0.7412, 'learning_rate': 4.194971488417753e-06, 'epoch': 0.71} {'loss': 0.7266, 'learning_rate': 4.189899097901421e-06, 'epoch': 0.71} {'loss': 0.6523, 'learning_rate': 4.184828963066305e-06, 'epoch': 0.71} {'loss': 0.7319, 'learning_rate': 4.179761085880809e-06, 'epoch': 0.71} {'loss': 0.2578, 'learning_rate': 4.174695468312456e-06, 'epoch': 0.71} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/812015320.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/205260780.jpg' {'loss': 0.7617, 'learning_rate': 4.16963211232789e-06, 'epoch': 0.71} {'loss': 0.7354, 'learning_rate': 4.16457101989289e-06, 'epoch': 0.71} {'loss': 0.7092, 'learning_rate': 4.159512192972337e-06, 'epoch': 0.71} {'loss': 0.8086, 'learning_rate': 4.15445563353024e-06, 'epoch': 0.71} {'loss': 0.7705, 'learning_rate': 4.149401343529742e-06, 'epoch': 0.71} {'loss': 0.7031, 'learning_rate': 4.144349324933077e-06, 'epoch': 0.71} {'loss': 0.7139, 'learning_rate': 4.139299579701623e-06, 'epoch': 0.71} {'loss': 0.7212, 'learning_rate': 4.134252109795863e-06, 'epoch': 0.71} {'loss': 0.8188, 'learning_rate': 4.129206917175397e-06, 'epoch': 0.71} {'loss': 0.7646, 'learning_rate': 4.124164003798944e-06, 'epoch': 0.71} {'loss': 0.7205, 'learning_rate': 4.119123371624335e-06, 'epoch': 0.71} {'loss': 0.7451, 'learning_rate': 4.114085022608517e-06, 'epoch': 0.71} {'loss': 0.7861, 'learning_rate': 4.109048958707552e-06, 'epoch': 0.71} {'loss': 0.6855, 'learning_rate': 4.104015181876613e-06, 'epoch': 0.71} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/231072430.jpg' {'loss': 0.7349, 'learning_rate': 4.09898369406998e-06, 'epoch': 0.71} {'loss': 0.7827, 'learning_rate': 4.0939544972410636e-06, 'epoch': 0.71} {'loss': 0.7705, 'learning_rate': 4.0889275933423576e-06, 'epoch': 0.71} {'loss': 0.8076, 'learning_rate': 4.0839029843254815e-06, 'epoch': 0.71} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/067944680X.jpg' {'loss': 0.8003, 'learning_rate': 4.078880672141171e-06, 'epoch': 0.71} {'loss': 0.689, 'learning_rate': 4.073860658739246e-06, 'epoch': 0.71} [2024-01-31 10:52:38,043] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7725, 'learning_rate': 4.068842946068661e-06, 'epoch': 0.71} [2024-01-31 10:52:55,927] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8022, 'learning_rate': 4.063827536077459e-06, 'epoch': 0.71} {'loss': 0.7031, 'learning_rate': 4.058814430712796e-06, 'epoch': 0.71} {'loss': 0.8018, 'learning_rate': 4.0538036319209325e-06, 'epoch': 0.71} {'loss': 0.7529, 'learning_rate': 4.0487951416472324e-06, 'epoch': 0.71} {'loss': 0.7451, 'learning_rate': 4.043788961836164e-06, 'epoch': 0.71} [2024-01-31 10:54:35,919] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7603, 'learning_rate': 4.038785094431295e-06, 'epoch': 0.71} [2024-01-31 10:54:54,272] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7876, 'learning_rate': 4.0337835413753116e-06, 'epoch': 0.71} {'loss': 0.7808, 'learning_rate': 4.0287843046099765e-06, 'epoch': 0.71} [2024-01-31 10:55:29,225] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7964, 'learning_rate': 4.0237873860761645e-06, 'epoch': 0.71} {'loss': 0.7427, 'learning_rate': 4.018792787713865e-06, 'epoch': 0.71} {'loss': 0.7656, 'learning_rate': 4.013800511462135e-06, 'epoch': 0.71} [2024-01-31 10:56:23,863] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7593, 'learning_rate': 4.008810559259162e-06, 'epoch': 0.71} {'loss': 0.7432, 'learning_rate': 4.003822933042213e-06, 'epoch': 0.71} [2024-01-31 10:57:03,419] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7412, 'learning_rate': 3.998837634747655e-06, 'epoch': 0.71} [2024-01-31 10:57:25,156] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7529, 'learning_rate': 3.993854666310955e-06, 'epoch': 0.71} {'loss': 0.8105, 'learning_rate': 3.98887402966667e-06, 'epoch': 0.71} {'loss': 0.769, 'learning_rate': 3.983895726748455e-06, 'epoch': 0.71} {'loss': 0.7437, 'learning_rate': 3.97891975948906e-06, 'epoch': 0.71} {'loss': 0.8018, 'learning_rate': 3.973946129820326e-06, 'epoch': 0.71} {'loss': 0.7964, 'learning_rate': 3.968974839673186e-06, 'epoch': 0.71} {'loss': 0.8291, 'learning_rate': 3.964005890977672e-06, 'epoch': 0.72} {'loss': 0.7617, 'learning_rate': 3.9590392856628946e-06, 'epoch': 0.72} {'loss': 0.7559, 'learning_rate': 3.954075025657058e-06, 'epoch': 0.72} {'loss': 0.8281, 'learning_rate': 3.949113112887471e-06, 'epoch': 0.72} [2024-01-31 11:00:32,755] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7617, 'learning_rate': 3.944153549280506e-06, 'epoch': 0.72} [2024-01-31 11:00:51,122] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7798, 'learning_rate': 3.939196336761645e-06, 'epoch': 0.72} {'loss': 0.8047, 'learning_rate': 3.934241477255445e-06, 'epoch': 0.72} [2024-01-31 11:01:28,696] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8091, 'learning_rate': 3.929288972685555e-06, 'epoch': 0.72} {'loss': 0.8032, 'learning_rate': 3.924338824974705e-06, 'epoch': 0.72} [2024-01-31 11:02:04,218] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8022, 'learning_rate': 3.919391036044715e-06, 'epoch': 0.72} {'loss': 0.7256, 'learning_rate': 3.914445607816486e-06, 'epoch': 0.72} [2024-01-31 11:02:39,153] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7388, 'learning_rate': 3.909502542210001e-06, 'epoch': 0.72} {'loss': 0.7505, 'learning_rate': 3.904561841144338e-06, 'epoch': 0.72} {'loss': 0.7354, 'learning_rate': 3.899623506537635e-06, 'epoch': 0.72} {'loss': 0.7234, 'learning_rate': 3.894687540307127e-06, 'epoch': 0.72} {'loss': 0.2655, 'learning_rate': 3.8897539443691355e-06, 'epoch': 0.72} {'loss': 0.7891, 'learning_rate': 3.884822720639036e-06, 'epoch': 0.72} {'loss': 0.7715, 'learning_rate': 3.879893871031314e-06, 'epoch': 0.72} {'loss': 0.8091, 'learning_rate': 3.874967397459511e-06, 'epoch': 0.72} {'loss': 0.8022, 'learning_rate': 3.870043301836256e-06, 'epoch': 0.72} {'loss': 0.7524, 'learning_rate': 3.86512158607325e-06, 'epoch': 0.72} {'loss': 0.708, 'learning_rate': 3.860202252081276e-06, 'epoch': 0.72} {'loss': 0.7534, 'learning_rate': 3.855285301770188e-06, 'epoch': 0.72} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/785808841.jpg' {'loss': 0.7319, 'learning_rate': 3.850370737048913e-06, 'epoch': 0.72} {'loss': 0.8438, 'learning_rate': 3.8454585598254565e-06, 'epoch': 0.72} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/006270110X.jpg' {'loss': 0.7437, 'learning_rate': 3.840548772006891e-06, 'epoch': 0.72} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/446387355.jpg' {'loss': 0.79, 'learning_rate': 3.835641375499375e-06, 'epoch': 0.72} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/3797306210.jpg' [2024-01-31 11:07:33,258] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8076, 'learning_rate': 3.830736372208118e-06, 'epoch': 0.72} {'loss': 0.7773, 'learning_rate': 3.8258337640374125e-06, 'epoch': 0.72} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739714600.jpg' [2024-01-31 11:08:11,248] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7378, 'learning_rate': 3.820933552890629e-06, 'epoch': 0.72} [2024-01-31 11:08:30,257] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2611, 'learning_rate': 3.816035740670185e-06, 'epoch': 0.72} {'loss': 0.7739, 'learning_rate': 3.811140329277591e-06, 'epoch': 0.72} [2024-01-31 11:09:06,817] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7773, 'learning_rate': 3.8062473206134088e-06, 'epoch': 0.72} [2024-01-31 11:09:25,855] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7607, 'learning_rate': 3.8013567165772735e-06, 'epoch': 0.72} [2024-01-31 11:09:46,537] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7842, 'learning_rate': 3.7964685190678874e-06, 'epoch': 0.72} {'loss': 0.7593, 'learning_rate': 3.7915827299830154e-06, 'epoch': 0.72} {'loss': 0.7695, 'learning_rate': 3.7866993512194895e-06, 'epoch': 0.72} {'loss': 0.7734, 'learning_rate': 3.7818183846732024e-06, 'epoch': 0.72} {'loss': 0.7119, 'learning_rate': 3.776939832239125e-06, 'epoch': 0.72} {'loss': 0.8081, 'learning_rate': 3.7720636958112623e-06, 'epoch': 0.72} {'loss': 0.8188, 'learning_rate': 3.7671899772827113e-06, 'epoch': 0.72} {'loss': 0.7666, 'learning_rate': 3.7623186785456156e-06, 'epoch': 0.72} {'loss': 0.7974, 'learning_rate': 3.757449801491172e-06, 'epoch': 0.72} {'loss': 0.7739, 'learning_rate': 3.7525833480096575e-06, 'epoch': 0.72} {'loss': 0.2527, 'learning_rate': 3.7477193199903903e-06, 'epoch': 0.72} {'loss': 0.7783, 'learning_rate': 3.7428577193217563e-06, 'epoch': 0.72} {'loss': 0.7632, 'learning_rate': 3.737998547891195e-06, 'epoch': 0.72} {'loss': 0.7549, 'learning_rate': 3.7331418075852053e-06, 'epoch': 0.72} {'loss': 0.7827, 'learning_rate': 3.728287500289339e-06, 'epoch': 0.72} {'loss': 0.7837, 'learning_rate': 3.7234356278882076e-06, 'epoch': 0.72} {'loss': 0.7803, 'learning_rate': 3.718586192265473e-06, 'epoch': 0.72} {'loss': 0.8081, 'learning_rate': 3.7137391953038516e-06, 'epoch': 0.72} {'loss': 0.8027, 'learning_rate': 3.7088946388851223e-06, 'epoch': 0.73} {'loss': 0.2535, 'learning_rate': 3.7040525248901003e-06, 'epoch': 0.73} {'loss': 0.7622, 'learning_rate': 3.6992128551986617e-06, 'epoch': 0.73} {'loss': 0.7559, 'learning_rate': 3.6943756316897406e-06, 'epoch': 0.73} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1862045852.jpg' {'loss': 0.7383, 'learning_rate': 3.6895408562413027e-06, 'epoch': 0.73} {'loss': 0.7441, 'learning_rate': 3.684708530730382e-06, 'epoch': 0.73} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/4544040604.jpg' {'loss': 0.7739, 'learning_rate': 3.6798786570330526e-06, 'epoch': 0.73} {'loss': 0.7117, 'learning_rate': 3.6750512370244363e-06, 'epoch': 0.73} {'loss': 0.7515, 'learning_rate': 3.670226272578704e-06, 'epoch': 0.73} {'loss': 0.8018, 'learning_rate': 3.6654037655690732e-06, 'epoch': 0.73} {'loss': 0.8257, 'learning_rate': 3.660583717867807e-06, 'epoch': 0.73} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/936783109.jpg' {'loss': 0.7656, 'learning_rate': 3.655766131346211e-06, 'epoch': 0.73} {'loss': 0.7891, 'learning_rate': 3.650951007874648e-06, 'epoch': 0.73} {'loss': 0.7402, 'learning_rate': 3.6461383493225012e-06, 'epoch': 0.73} {'loss': 0.8115, 'learning_rate': 3.6413281575582194e-06, 'epoch': 0.73} {'loss': 0.7373, 'learning_rate': 3.6365204344492867e-06, 'epoch': 0.73} {'loss': 0.7817, 'learning_rate': 3.6317151818622154e-06, 'epoch': 0.73} {'loss': 0.7607, 'learning_rate': 3.62691240166258e-06, 'epoch': 0.73} {'loss': 0.7476, 'learning_rate': 3.6221120957149826e-06, 'epoch': 0.73} {'loss': 0.8271, 'learning_rate': 3.617314265883066e-06, 'epoch': 0.73} {'loss': 0.7627, 'learning_rate': 3.612518914029515e-06, 'epoch': 0.73} {'loss': 0.7993, 'learning_rate': 3.6077260420160487e-06, 'epoch': 0.73} {'loss': 0.8237, 'learning_rate': 3.602935651703424e-06, 'epoch': 0.73} {'loss': 0.7358, 'learning_rate': 3.598147744951438e-06, 'epoch': 0.73} {'loss': 0.7559, 'learning_rate': 3.5933623236189198e-06, 'epoch': 0.73} [2024-01-31 11:23:00,985] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7227, 'learning_rate': 3.58857938956373e-06, 'epoch': 0.73} {'loss': 0.8062, 'learning_rate': 3.58379894464278e-06, 'epoch': 0.73} [2024-01-31 11:23:37,351] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8018, 'learning_rate': 3.57902099071199e-06, 'epoch': 0.73} {'loss': 0.7461, 'learning_rate': 3.5742455296263346e-06, 'epoch': 0.73} {'loss': 0.7661, 'learning_rate': 3.569472563239814e-06, 'epoch': 0.73} {'loss': 0.7539, 'learning_rate': 3.5647020934054465e-06, 'epoch': 0.73} [2024-01-31 11:24:54,974] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7808, 'learning_rate': 3.559934121975304e-06, 'epoch': 0.73} [2024-01-31 11:25:13,576] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7793, 'learning_rate': 3.5551686508004735e-06, 'epoch': 0.73} [2024-01-31 11:25:32,031] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7861, 'learning_rate': 3.550405681731074e-06, 'epoch': 0.73} [2024-01-31 11:25:52,772] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.853, 'learning_rate': 3.5456452166162547e-06, 'epoch': 0.73} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1579901387.jpg' [2024-01-31 11:26:13,850] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7651, 'learning_rate': 3.540887257304193e-06, 'epoch': 0.73} {'loss': 0.7939, 'learning_rate': 3.5361318056420925e-06, 'epoch': 0.73} {'loss': 0.7417, 'learning_rate': 3.531378863476178e-06, 'epoch': 0.73} {'loss': 0.7002, 'learning_rate': 3.5266284326517165e-06, 'epoch': 0.73} {'loss': 0.7466, 'learning_rate': 3.5218805150129755e-06, 'epoch': 0.73} {'loss': 0.7705, 'learning_rate': 3.5171351124032703e-06, 'epoch': 0.73} {'loss': 0.2843, 'learning_rate': 3.51239222666493e-06, 'epoch': 0.73} {'loss': 0.7456, 'learning_rate': 3.507651859639295e-06, 'epoch': 0.73} {'loss': 0.772, 'learning_rate': 3.5029140131667493e-06, 'epoch': 0.73} {'loss': 0.7661, 'learning_rate': 3.4981786890866853e-06, 'epoch': 0.73} {'loss': 0.7725, 'learning_rate': 3.493445889237518e-06, 'epoch': 0.73} {'loss': 0.2345, 'learning_rate': 3.4887156154566847e-06, 'epoch': 0.73} {'loss': 0.71, 'learning_rate': 3.4839878695806385e-06, 'epoch': 0.73} {'loss': 0.7568, 'learning_rate': 3.4792626534448547e-06, 'epoch': 0.73} {'loss': 0.7397, 'learning_rate': 3.4745399688838243e-06, 'epoch': 0.73} {'loss': 0.7598, 'learning_rate': 3.469819817731056e-06, 'epoch': 0.73} {'loss': 0.7925, 'learning_rate': 3.4651022018190715e-06, 'epoch': 0.73} {'loss': 0.2615, 'learning_rate': 3.460387122979423e-06, 'epoch': 0.74} {'loss': 0.7222, 'learning_rate': 3.455674583042652e-06, 'epoch': 0.74} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1570611912.jpg' {'loss': 0.2595, 'learning_rate': 3.4509645838383386e-06, 'epoch': 0.74} {'loss': 0.7046, 'learning_rate': 3.446257127195066e-06, 'epoch': 0.74} {'loss': 0.7031, 'learning_rate': 3.4415522149404233e-06, 'epoch': 0.74} {'loss': 0.7148, 'learning_rate': 3.436849848901028e-06, 'epoch': 0.74} {'loss': 0.8193, 'learning_rate': 3.432150030902497e-06, 'epoch': 0.74} {'loss': 0.7705, 'learning_rate': 3.427452762769462e-06, 'epoch': 0.74} {'loss': 0.8066, 'learning_rate': 3.4227580463255628e-06, 'epoch': 0.74} {'loss': 0.7534, 'learning_rate': 3.4180658833934523e-06, 'epoch': 0.74} {'loss': 0.7231, 'learning_rate': 3.4133762757947873e-06, 'epoch': 0.74} {'loss': 0.7837, 'learning_rate': 3.4086892253502344e-06, 'epoch': 0.74} {'loss': 0.7812, 'learning_rate': 3.4040047338794756e-06, 'epoch': 0.74} {'loss': 0.8149, 'learning_rate': 3.3993228032011784e-06, 'epoch': 0.74} {'loss': 0.7778, 'learning_rate': 3.3946434351330415e-06, 'epoch': 0.74} {'loss': 0.7656, 'learning_rate': 3.3899666314917512e-06, 'epoch': 0.74} {'loss': 0.8188, 'learning_rate': 3.385292394093006e-06, 'epoch': 0.74} {'loss': 0.7827, 'learning_rate': 3.3806207247515068e-06, 'epoch': 0.74} {'loss': 0.7383, 'learning_rate': 3.375951625280948e-06, 'epoch': 0.74} {'loss': 0.7603, 'learning_rate': 3.3712850974940437e-06, 'epoch': 0.74} {'loss': 0.7188, 'learning_rate': 3.3666211432024974e-06, 'epoch': 0.74} {'loss': 0.7212, 'learning_rate': 3.361959764217018e-06, 'epoch': 0.74} {'loss': 0.7495, 'learning_rate': 3.357300962347313e-06, 'epoch': 0.74} {'loss': 0.6929, 'learning_rate': 3.3526447394020887e-06, 'epoch': 0.74} {'loss': 0.7593, 'learning_rate': 3.3479910971890516e-06, 'epoch': 0.74} {'loss': 0.8325, 'learning_rate': 3.343340037514903e-06, 'epoch': 0.74} {'loss': 0.7446, 'learning_rate': 3.3386915621853533e-06, 'epoch': 0.74} {'loss': 0.7358, 'learning_rate': 3.3340456730050887e-06, 'epoch': 0.74} {'loss': 0.7778, 'learning_rate': 3.3294023717778122e-06, 'epoch': 0.74} {'loss': 0.7832, 'learning_rate': 3.324761660306215e-06, 'epoch': 0.74} {'loss': 0.7598, 'learning_rate': 3.3201235403919683e-06, 'epoch': 0.74} {'loss': 0.7339, 'learning_rate': 3.3154880138357626e-06, 'epoch': 0.74} {'loss': 0.8052, 'learning_rate': 3.3108550824372632e-06, 'epoch': 0.74} {'loss': 0.7915, 'learning_rate': 3.306224747995136e-06, 'epoch': 0.74} {'loss': 0.8018, 'learning_rate': 3.301597012307034e-06, 'epoch': 0.74} {'loss': 0.7217, 'learning_rate': 3.2969718771696047e-06, 'epoch': 0.74} {'loss': 0.7905, 'learning_rate': 3.292349344378486e-06, 'epoch': 0.74} {'loss': 0.7871, 'learning_rate': 3.287729415728298e-06, 'epoch': 0.74} {'loss': 0.7412, 'learning_rate': 3.283112093012669e-06, 'epoch': 0.74} {'loss': 0.7861, 'learning_rate': 3.278497378024187e-06, 'epoch': 0.74} {'loss': 0.7588, 'learning_rate': 3.2738852725544547e-06, 'epoch': 0.74} {'loss': 0.7705, 'learning_rate': 3.2692757783940467e-06, 'epoch': 0.74} {'loss': 0.7534, 'learning_rate': 3.264668897332527e-06, 'epoch': 0.74} {'loss': 0.73, 'learning_rate': 3.2600646311584494e-06, 'epoch': 0.74} {'loss': 0.7505, 'learning_rate': 3.2554629816593375e-06, 'epoch': 0.74} {'loss': 0.7705, 'learning_rate': 3.250863950621721e-06, 'epoch': 0.74} {'loss': 0.7446, 'learning_rate': 3.2462675398310984e-06, 'epoch': 0.74} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/785805516.jpg' {'loss': 0.7637, 'learning_rate': 3.241673751071954e-06, 'epoch': 0.74} {'loss': 0.73, 'learning_rate': 3.2370825861277567e-06, 'epoch': 0.74} {'loss': 0.8101, 'learning_rate': 3.2324940467809527e-06, 'epoch': 0.74} {'loss': 0.7563, 'learning_rate': 3.2279081348129713e-06, 'epoch': 0.74} {'loss': 0.6951, 'learning_rate': 3.223324852004219e-06, 'epoch': 0.74} {'loss': 0.7891, 'learning_rate': 3.2187442001340942e-06, 'epoch': 0.75} {'loss': 0.7417, 'learning_rate': 3.21416618098095e-06, 'epoch': 0.75} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/939302349.jpg' {'loss': 0.7598, 'learning_rate': 3.2095907963221396e-06, 'epoch': 0.75} {'loss': 0.2416, 'learning_rate': 3.2050180479339865e-06, 'epoch': 0.75} {'loss': 0.7349, 'learning_rate': 3.2004479375917783e-06, 'epoch': 0.75} {'loss': 0.7466, 'learning_rate': 3.1958804670698008e-06, 'epoch': 0.75} {'loss': 0.2672, 'learning_rate': 3.191315638141297e-06, 'epoch': 0.75} {'loss': 0.7534, 'learning_rate': 3.1867534525784937e-06, 'epoch': 0.75} {'loss': 0.262, 'learning_rate': 3.182193912152586e-06, 'epoch': 0.75} {'loss': 0.7773, 'learning_rate': 3.177637018633746e-06, 'epoch': 0.75} {'loss': 0.833, 'learning_rate': 3.1730827737911163e-06, 'epoch': 0.75} {'loss': 0.7778, 'learning_rate': 3.1685311793928077e-06, 'epoch': 0.75} {'loss': 0.7314, 'learning_rate': 3.163982237205917e-06, 'epoch': 0.75} {'loss': 0.769, 'learning_rate': 3.1594359489964853e-06, 'epoch': 0.75} {'loss': 0.7632, 'learning_rate': 3.15489231652955e-06, 'epoch': 0.75} {'loss': 0.7725, 'learning_rate': 3.150351341569101e-06, 'epoch': 0.75} {'loss': 0.2538, 'learning_rate': 3.1458130258781006e-06, 'epoch': 0.75} {'loss': 0.7998, 'learning_rate': 3.141277371218484e-06, 'epoch': 0.75} {'loss': 0.7915, 'learning_rate': 3.136744379351139e-06, 'epoch': 0.75} {'loss': 0.2666, 'learning_rate': 3.1322140520359366e-06, 'epoch': 0.75} {'loss': 0.731, 'learning_rate': 3.1276863910317057e-06, 'epoch': 0.75} {'loss': 0.7979, 'learning_rate': 3.1231613980962373e-06, 'epoch': 0.75} {'loss': 0.7808, 'learning_rate': 3.1186390749862904e-06, 'epoch': 0.75} {'loss': 0.7749, 'learning_rate': 3.1141194234575878e-06, 'epoch': 0.75} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/962770124.jpg' {'loss': 0.7207, 'learning_rate': 3.1096024452648123e-06, 'epoch': 0.75} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/60164255.jpg' {'loss': 0.8188, 'learning_rate': 3.1050881421616076e-06, 'epoch': 0.75} {'loss': 0.7739, 'learning_rate': 3.100576515900591e-06, 'epoch': 0.75} {'loss': 0.7603, 'learning_rate': 3.0960675682333186e-06, 'epoch': 0.75} {'loss': 0.7476, 'learning_rate': 3.0915613009103296e-06, 'epoch': 0.75} {'loss': 0.7759, 'learning_rate': 3.0870577156811077e-06, 'epoch': 0.75} {'loss': 0.7378, 'learning_rate': 3.0825568142940998e-06, 'epoch': 0.75} {'loss': 0.7217, 'learning_rate': 3.0780585984967113e-06, 'epoch': 0.75} [2024-01-31 11:57:28,585] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7485, 'learning_rate': 3.073563070035305e-06, 'epoch': 0.75} [2024-01-31 11:57:47,372] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7305, 'learning_rate': 3.069070230655198e-06, 'epoch': 0.75} {'loss': 0.7461, 'learning_rate': 3.0645800821006667e-06, 'epoch': 0.75} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/135707978.jpg' [2024-01-31 11:58:26,568] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.6978, 'learning_rate': 3.060092626114941e-06, 'epoch': 0.75} {'loss': 0.7632, 'learning_rate': 3.0556078644402066e-06, 'epoch': 0.75} {'loss': 0.748, 'learning_rate': 3.051125798817598e-06, 'epoch': 0.75} {'loss': 0.7998, 'learning_rate': 3.0466464309872167e-06, 'epoch': 0.75} {'loss': 0.7627, 'learning_rate': 3.042169762688096e-06, 'epoch': 0.75} {'loss': 0.7383, 'learning_rate': 3.0376957956582452e-06, 'epoch': 0.75} {'loss': 0.2555, 'learning_rate': 3.0332245316346e-06, 'epoch': 0.75} {'loss': 0.7202, 'learning_rate': 3.0287559723530667e-06, 'epoch': 0.75} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/393314286.jpg' {'loss': 0.8228, 'learning_rate': 3.024290119548495e-06, 'epoch': 0.75} {'loss': 0.7349, 'learning_rate': 3.019826974954674e-06, 'epoch': 0.75} {'loss': 0.731, 'learning_rate': 3.0153665403043586e-06, 'epoch': 0.75} {'loss': 0.7466, 'learning_rate': 3.01090881732924e-06, 'epoch': 0.75} {'loss': 0.7549, 'learning_rate': 3.0064538077599603e-06, 'epoch': 0.75} {'loss': 0.793, 'learning_rate': 3.002001513326107e-06, 'epoch': 0.75} [2024-01-31 12:02:49,669] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7739, 'learning_rate': 2.9975519357562155e-06, 'epoch': 0.75} [2024-01-31 12:03:06,345] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8154, 'learning_rate': 2.9931050767777626e-06, 'epoch': 0.75} {'loss': 0.7837, 'learning_rate': 2.9886609381171703e-06, 'epoch': 0.75} {'loss': 0.7686, 'learning_rate': 2.984219521499816e-06, 'epoch': 0.76} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/71351817.jpg' {'loss': 0.7681, 'learning_rate': 2.9797808286499976e-06, 'epoch': 0.76} {'loss': 0.8184, 'learning_rate': 2.9753448612909775e-06, 'epoch': 0.76} {'loss': 0.7104, 'learning_rate': 2.9709116211449484e-06, 'epoch': 0.76} {'loss': 0.7271, 'learning_rate': 2.966481109933047e-06, 'epoch': 0.76} {'loss': 0.7495, 'learning_rate': 2.9620533293753495e-06, 'epoch': 0.76} {'loss': 0.7896, 'learning_rate': 2.957628281190873e-06, 'epoch': 0.76} {'loss': 0.7534, 'learning_rate': 2.9532059670975732e-06, 'epoch': 0.76} {'loss': 0.7656, 'learning_rate': 2.948786388812346e-06, 'epoch': 0.76} [2024-01-31 12:06:29,365] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7378, 'learning_rate': 2.9443695480510225e-06, 'epoch': 0.76} [2024-01-31 12:06:48,357] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7588, 'learning_rate': 2.9399554465283742e-06, 'epoch': 0.76} [2024-01-31 12:07:08,456] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7036, 'learning_rate': 2.935544085958102e-06, 'epoch': 0.76} {'loss': 0.7231, 'learning_rate': 2.931135468052858e-06, 'epoch': 0.76} {'loss': 0.2771, 'learning_rate': 2.926729594524207e-06, 'epoch': 0.76} {'loss': 0.7637, 'learning_rate': 2.9223264670826746e-06, 'epoch': 0.76} {'loss': 0.8149, 'learning_rate': 2.9179260874376915e-06, 'epoch': 0.76} {'loss': 0.8052, 'learning_rate': 2.9135284572976486e-06, 'epoch': 0.76} {'loss': 0.7324, 'learning_rate': 2.9091335783698517e-06, 'epoch': 0.76} {'loss': 0.2484, 'learning_rate': 2.9047414523605467e-06, 'epoch': 0.76} {'loss': 0.7812, 'learning_rate': 2.9003520809749053e-06, 'epoch': 0.76} {'loss': 0.791, 'learning_rate': 2.8959654659170354e-06, 'epoch': 0.76} {'loss': 0.7729, 'learning_rate': 2.8915816088899696e-06, 'epoch': 0.76} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/316142778.jpg' {'loss': 0.7275, 'learning_rate': 2.8872005115956746e-06, 'epoch': 0.76} {'loss': 0.7666, 'learning_rate': 2.8828221757350406e-06, 'epoch': 0.76} {'loss': 0.7896, 'learning_rate': 2.8784466030078905e-06, 'epoch': 0.76} {'loss': 0.7432, 'learning_rate': 2.874073795112967e-06, 'epoch': 0.76} {'loss': 0.7764, 'learning_rate': 2.8697037537479565e-06, 'epoch': 0.76} {'loss': 0.7891, 'learning_rate': 2.8653364806094454e-06, 'epoch': 0.76} {'loss': 0.2744, 'learning_rate': 2.86097197739297e-06, 'epoch': 0.76} {'loss': 0.7656, 'learning_rate': 2.856610245792976e-06, 'epoch': 0.76} {'loss': 0.7231, 'learning_rate': 2.8522512875028396e-06, 'epoch': 0.76} {'loss': 0.8169, 'learning_rate': 2.847895104214856e-06, 'epoch': 0.76} {'loss': 0.7822, 'learning_rate': 2.843541697620249e-06, 'epoch': 0.76} {'loss': 0.7603, 'learning_rate': 2.8391910694091584e-06, 'epoch': 0.76} {'loss': 0.7983, 'learning_rate': 2.8348432212706443e-06, 'epoch': 0.76} {'loss': 0.7515, 'learning_rate': 2.8304981548927025e-06, 'epoch': 0.76} {'loss': 0.2103, 'learning_rate': 2.826155871962227e-06, 'epoch': 0.76} {'loss': 0.7847, 'learning_rate': 2.8218163741650415e-06, 'epoch': 0.76} {'loss': 0.7983, 'learning_rate': 2.817479663185898e-06, 'epoch': 0.76} {'loss': 0.7729, 'learning_rate': 2.813145740708445e-06, 'epoch': 0.76} [2024-01-31 12:16:07,981] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7207, 'learning_rate': 2.808814608415271e-06, 'epoch': 0.76} [2024-01-31 12:16:25,959] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7993, 'learning_rate': 2.8044862679878605e-06, 'epoch': 0.76} {'loss': 0.7163, 'learning_rate': 2.800160721106633e-06, 'epoch': 0.76} {'loss': 0.7749, 'learning_rate': 2.7958379694509108e-06, 'epoch': 0.76} {'loss': 0.7432, 'learning_rate': 2.791518014698935e-06, 'epoch': 0.76} {'loss': 0.7271, 'learning_rate': 2.787200858527862e-06, 'epoch': 0.76} {'loss': 0.7998, 'learning_rate': 2.7828865026137584e-06, 'epoch': 0.76} {'loss': 0.7793, 'learning_rate': 2.7785749486316085e-06, 'epoch': 0.76} {'loss': 0.7676, 'learning_rate': 2.774266198255303e-06, 'epoch': 0.76} {'loss': 0.7622, 'learning_rate': 2.7699602531576496e-06, 'epoch': 0.76} {'loss': 0.7744, 'learning_rate': 2.765657115010364e-06, 'epoch': 0.76} {'loss': 0.8149, 'learning_rate': 2.7613567854840685e-06, 'epoch': 0.76} {'loss': 0.7886, 'learning_rate': 2.7570592662483086e-06, 'epoch': 0.77} {'loss': 0.77, 'learning_rate': 2.752764558971517e-06, 'epoch': 0.77} {'loss': 0.7354, 'learning_rate': 2.748472665321056e-06, 'epoch': 0.77} {'loss': 0.7871, 'learning_rate': 2.744183586963185e-06, 'epoch': 0.77} [2024-01-31 12:21:00,270] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8062, 'learning_rate': 2.739897325563069e-06, 'epoch': 0.77} [2024-01-31 12:21:19,810] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7925, 'learning_rate': 2.7356138827847856e-06, 'epoch': 0.77} [2024-01-31 12:21:37,237] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7119, 'learning_rate': 2.731333260291311e-06, 'epoch': 0.77} {'loss': 0.7407, 'learning_rate': 2.7270554597445343e-06, 'epoch': 0.77} {'loss': 0.2643, 'learning_rate': 2.7227804828052384e-06, 'epoch': 0.77} [2024-01-31 12:22:40,653] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7905, 'learning_rate': 2.7185083311331283e-06, 'epoch': 0.77} [2024-01-31 12:22:58,892] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7656, 'learning_rate': 2.7142390063867896e-06, 'epoch': 0.77} {'loss': 0.7285, 'learning_rate': 2.709972510223725e-06, 'epoch': 0.77} {'loss': 0.7095, 'learning_rate': 2.7057088443003343e-06, 'epoch': 0.77} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/030724055X.jpg' {'loss': 0.728, 'learning_rate': 2.7014480102719174e-06, 'epoch': 0.77} {'loss': 0.813, 'learning_rate': 2.697190009792685e-06, 'epoch': 0.77} {'loss': 0.7041, 'learning_rate': 2.692934844515729e-06, 'epoch': 0.77} {'loss': 0.7949, 'learning_rate': 2.6886825160930587e-06, 'epoch': 0.77} {'loss': 0.7505, 'learning_rate': 2.6844330261755715e-06, 'epoch': 0.77} {'loss': 0.7368, 'learning_rate': 2.6801863764130653e-06, 'epoch': 0.77} {'loss': 0.7373, 'learning_rate': 2.675942568454236e-06, 'epoch': 0.77} {'loss': 0.7637, 'learning_rate': 2.671701603946678e-06, 'epoch': 0.77} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/945397690.jpg' {'loss': 0.7646, 'learning_rate': 2.667463484536876e-06, 'epoch': 0.77} {'loss': 0.7617, 'learning_rate': 2.6632282118702147e-06, 'epoch': 0.77} {'loss': 0.7007, 'learning_rate': 2.65899578759098e-06, 'epoch': 0.77} {'loss': 0.7285, 'learning_rate': 2.654766213342335e-06, 'epoch': 0.77} {'loss': 0.7178, 'learning_rate': 2.650539490766346e-06, 'epoch': 0.77} {'loss': 0.7217, 'learning_rate': 2.646315621503983e-06, 'epoch': 0.77} {'loss': 0.7319, 'learning_rate': 2.642094607195085e-06, 'epoch': 0.77} {'loss': 0.7651, 'learning_rate': 2.6378764494784027e-06, 'epoch': 0.77} {'loss': 0.7671, 'learning_rate': 2.633661149991569e-06, 'epoch': 0.77} {'loss': 0.7212, 'learning_rate': 2.6294487103711064e-06, 'epoch': 0.77} [2024-01-31 12:32:17,884] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7759, 'learning_rate': 2.6252391322524297e-06, 'epoch': 0.77} {'loss': 0.8003, 'learning_rate': 2.6210324172698432e-06, 'epoch': 0.77} [2024-01-31 12:32:54,240] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7461, 'learning_rate': 2.6168285670565374e-06, 'epoch': 0.77} [2024-01-31 12:33:14,346] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7891, 'learning_rate': 2.6126275832445892e-06, 'epoch': 0.77} {'loss': 0.6887, 'learning_rate': 2.6084294674649734e-06, 'epoch': 0.77} {'loss': 0.8042, 'learning_rate': 2.6042342213475346e-06, 'epoch': 0.77} {'loss': 0.7612, 'learning_rate': 2.6000418465210143e-06, 'epoch': 0.77} {'loss': 0.7739, 'learning_rate': 2.595852344613038e-06, 'epoch': 0.77} {'loss': 0.7778, 'learning_rate': 2.5916657172501103e-06, 'epoch': 0.77} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1887089160.jpg' {'loss': 0.7231, 'learning_rate': 2.587481966057633e-06, 'epoch': 0.77} {'loss': 0.7256, 'learning_rate': 2.583301092659872e-06, 'epoch': 0.77} {'loss': 0.7495, 'learning_rate': 2.5791230986799944e-06, 'epoch': 0.77} {'loss': 0.7939, 'learning_rate': 2.5749479857400383e-06, 'epoch': 0.77} {'loss': 0.7832, 'learning_rate': 2.5707757554609247e-06, 'epoch': 0.77} [2024-01-31 12:36:43,396] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7964, 'learning_rate': 2.56660640946246e-06, 'epoch': 0.77} {'loss': 0.772, 'learning_rate': 2.5624399493633257e-06, 'epoch': 0.77} {'loss': 0.8228, 'learning_rate': 2.558276376781086e-06, 'epoch': 0.77} {'loss': 0.7344, 'learning_rate': 2.55411569333218e-06, 'epoch': 0.77} {'loss': 0.7651, 'learning_rate': 2.5499579006319365e-06, 'epoch': 0.77} {'loss': 0.813, 'learning_rate': 2.5458030002945457e-06, 'epoch': 0.77} {'loss': 0.7734, 'learning_rate': 2.5416509939330836e-06, 'epoch': 0.77} {'loss': 0.2803, 'learning_rate': 2.537501883159509e-06, 'epoch': 0.78} {'loss': 0.7725, 'learning_rate': 2.5333556695846384e-06, 'epoch': 0.78} {'loss': 0.752, 'learning_rate': 2.5292123548181847e-06, 'epoch': 0.78} {'loss': 0.7407, 'learning_rate': 2.525071940468722e-06, 'epoch': 0.78} {'loss': 0.7744, 'learning_rate': 2.520934428143701e-06, 'epoch': 0.78} {'loss': 0.2649, 'learning_rate': 2.5167998194494468e-06, 'epoch': 0.78} {'loss': 0.7266, 'learning_rate': 2.5126681159911558e-06, 'epoch': 0.78} {'loss': 0.7515, 'learning_rate': 2.5085393193729e-06, 'epoch': 0.78} {'loss': 0.8042, 'learning_rate': 2.5044134311976156e-06, 'epoch': 0.78} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1568362021.jpg' {'loss': 0.7324, 'learning_rate': 2.5002904530671236e-06, 'epoch': 0.78} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1555951295.jpg' {'loss': 0.7925, 'learning_rate': 2.4961703865820974e-06, 'epoch': 0.78} {'loss': 0.7739, 'learning_rate': 2.492053233342091e-06, 'epoch': 0.78} {'loss': 0.7546, 'learning_rate': 2.487938994945527e-06, 'epoch': 0.78} {'loss': 0.7344, 'learning_rate': 2.4838276729896884e-06, 'epoch': 0.78} {'loss': 0.264, 'learning_rate': 2.479719269070743e-06, 'epoch': 0.78} {'loss': 0.7495, 'learning_rate': 2.4756137847837025e-06, 'epoch': 0.78} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/821411896.jpg' {'loss': 0.7427, 'learning_rate': 2.4715112217224657e-06, 'epoch': 0.78} {'loss': 0.8184, 'learning_rate': 2.467411581479786e-06, 'epoch': 0.78} {'loss': 0.731, 'learning_rate': 2.463314865647286e-06, 'epoch': 0.78} {'loss': 0.7666, 'learning_rate': 2.45922107581545e-06, 'epoch': 0.78} {'loss': 0.77, 'learning_rate': 2.4551302135736287e-06, 'epoch': 0.78} {'loss': 0.7451, 'learning_rate': 2.4510422805100366e-06, 'epoch': 0.78} WARNING: tokenization mismatch: 1 vs. 1473. (ignored) {'loss': 0.7485, 'learning_rate': 2.446957278211746e-06, 'epoch': 0.78} {'loss': 0.7363, 'learning_rate': 2.4428752082647044e-06, 'epoch': 0.78} {'loss': 0.7036, 'learning_rate': 2.438796072253704e-06, 'epoch': 0.78} {'loss': 0.7114, 'learning_rate': 2.4347198717624054e-06, 'epoch': 0.78} {'loss': 0.7817, 'learning_rate': 2.4306466083733392e-06, 'epoch': 0.78} {'loss': 0.7236, 'learning_rate': 2.426576283667873e-06, 'epoch': 0.78} [2024-01-31 12:47:22,806] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7593, 'learning_rate': 2.422508899226258e-06, 'epoch': 0.78} {'loss': 0.7935, 'learning_rate': 2.418444456627589e-06, 'epoch': 0.78} [2024-01-31 12:48:00,521] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7563, 'learning_rate': 2.4143829574498224e-06, 'epoch': 0.78} [2024-01-31 12:48:19,932] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.245, 'learning_rate': 2.4103244032697717e-06, 'epoch': 0.78} {'loss': 0.7407, 'learning_rate': 2.406268795663108e-06, 'epoch': 0.78} {'loss': 0.7925, 'learning_rate': 2.4022161362043574e-06, 'epoch': 0.78} {'loss': 0.7866, 'learning_rate': 2.3981664264669025e-06, 'epoch': 0.78} {'loss': 0.7168, 'learning_rate': 2.3941196680229794e-06, 'epoch': 0.78} {'loss': 0.7734, 'learning_rate': 2.3900758624436772e-06, 'epoch': 0.78} {'loss': 0.7573, 'learning_rate': 2.3860350112989473e-06, 'epoch': 0.78} {'loss': 0.7646, 'learning_rate': 2.3819971161575807e-06, 'epoch': 0.78} {'loss': 0.7808, 'learning_rate': 2.3779621785872252e-06, 'epoch': 0.78} {'loss': 0.7319, 'learning_rate': 2.3739302001543918e-06, 'epoch': 0.78} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/70224889.jpg' {'loss': 0.8359, 'learning_rate': 2.3699011824244234e-06, 'epoch': 0.78} {'loss': 0.7383, 'learning_rate': 2.365875126961531e-06, 'epoch': 0.78} {'loss': 0.7827, 'learning_rate': 2.3618520353287644e-06, 'epoch': 0.78} {'loss': 0.7964, 'learning_rate': 2.3578319090880263e-06, 'epoch': 0.78} {'loss': 0.7876, 'learning_rate': 2.3538147498000695e-06, 'epoch': 0.78} {'loss': 0.769, 'learning_rate': 2.349800559024492e-06, 'epoch': 0.78} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1883323703.jpg' {'loss': 0.8086, 'learning_rate': 2.3457893383197415e-06, 'epoch': 0.78} {'loss': 0.7603, 'learning_rate': 2.3417810892431104e-06, 'epoch': 0.78} {'loss': 0.7852, 'learning_rate': 2.3377758133507455e-06, 'epoch': 0.78} {'loss': 0.7915, 'learning_rate': 2.3337735121976247e-06, 'epoch': 0.78} {'loss': 0.7666, 'learning_rate': 2.32977418733758e-06, 'epoch': 0.78} {'loss': 0.7656, 'learning_rate': 2.3257778403232954e-06, 'epoch': 0.79} {'loss': 0.7471, 'learning_rate': 2.321784472706279e-06, 'epoch': 0.79} {'loss': 0.7803, 'learning_rate': 2.317794086036901e-06, 'epoch': 0.79} {'loss': 0.7651, 'learning_rate': 2.3138066818643647e-06, 'epoch': 0.79} {'loss': 0.7417, 'learning_rate': 2.3098222617367184e-06, 'epoch': 0.79} {'loss': 0.7852, 'learning_rate': 2.30584082720085e-06, 'epoch': 0.79} {'loss': 0.7661, 'learning_rate': 2.301862379802492e-06, 'epoch': 0.79} {'loss': 0.7871, 'learning_rate': 2.297886921086211e-06, 'epoch': 0.79} {'loss': 0.7251, 'learning_rate': 2.2939144525954194e-06, 'epoch': 0.79} {'loss': 0.7852, 'learning_rate': 2.2899449758723657e-06, 'epoch': 0.79} {'loss': 0.6846, 'learning_rate': 2.285978492458134e-06, 'epoch': 0.79} {'loss': 0.8101, 'learning_rate': 2.282015003892659e-06, 'epoch': 0.79} {'loss': 0.7319, 'learning_rate': 2.2780545117146947e-06, 'epoch': 0.79} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/060960323X.jpg' {'loss': 0.792, 'learning_rate': 2.2740970174618405e-06, 'epoch': 0.79} {'loss': 0.7324, 'learning_rate': 2.270142522670541e-06, 'epoch': 0.79} {'loss': 0.7563, 'learning_rate': 2.2661910288760545e-06, 'epoch': 0.79} {'loss': 0.7563, 'learning_rate': 2.262242537612497e-06, 'epoch': 0.79} {'loss': 0.8276, 'learning_rate': 2.258297050412804e-06, 'epoch': 0.79} {'loss': 0.6538, 'learning_rate': 2.254354568808752e-06, 'epoch': 0.79} {'loss': 0.7466, 'learning_rate': 2.2504150943309455e-06, 'epoch': 0.79} {'loss': 0.7769, 'learning_rate': 2.246478628508827e-06, 'epoch': 0.79} {'loss': 0.7329, 'learning_rate': 2.242545172870665e-06, 'epoch': 0.79} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/031476271X.jpg' {'loss': 0.7266, 'learning_rate': 2.238614728943561e-06, 'epoch': 0.79} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/937274461.jpg' {'loss': 0.6953, 'learning_rate': 2.2346872982534584e-06, 'epoch': 0.79} [2024-01-31 13:02:16,362] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7168, 'learning_rate': 2.2307628823251083e-06, 'epoch': 0.79} {'loss': 0.7437, 'learning_rate': 2.2268414826821117e-06, 'epoch': 0.79} {'loss': 0.8047, 'learning_rate': 2.222923100846893e-06, 'epoch': 0.79} [2024-01-31 13:03:09,082] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7378, 'learning_rate': 2.2190077383406938e-06, 'epoch': 0.79} {'loss': 0.7778, 'learning_rate': 2.2150953966835996e-06, 'epoch': 0.79} {'loss': 0.7681, 'learning_rate': 2.211186077394516e-06, 'epoch': 0.79} {'loss': 0.8218, 'learning_rate': 2.207279781991173e-06, 'epoch': 0.79} {'loss': 0.709, 'learning_rate': 2.2033765119901294e-06, 'epoch': 0.79} [2024-01-31 13:04:44,721] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7168, 'learning_rate': 2.1994762689067705e-06, 'epoch': 0.79} [2024-01-31 13:05:05,617] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7905, 'learning_rate': 2.1955790542553036e-06, 'epoch': 0.79} [2024-01-31 13:05:24,446] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7954, 'learning_rate': 2.1916848695487615e-06, 'epoch': 0.79} {'loss': 0.7197, 'learning_rate': 2.1877937162990015e-06, 'epoch': 0.79} [2024-01-31 13:05:59,578] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7856, 'learning_rate': 2.1839055960167e-06, 'epoch': 0.79} {'loss': 0.7886, 'learning_rate': 2.180020510211367e-06, 'epoch': 0.79} [2024-01-31 13:06:35,517] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.749, 'learning_rate': 2.1761384603913203e-06, 'epoch': 0.79} [2024-01-31 13:06:52,783] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7852, 'learning_rate': 2.172259448063704e-06, 'epoch': 0.79} {'loss': 0.7676, 'learning_rate': 2.1683834747344913e-06, 'epoch': 0.79} {'loss': 0.7285, 'learning_rate': 2.1645105419084587e-06, 'epoch': 0.79} {'loss': 0.7939, 'learning_rate': 2.160640651089221e-06, 'epoch': 0.79} {'loss': 0.2622, 'learning_rate': 2.1567738037791998e-06, 'epoch': 0.79} {'loss': 0.7241, 'learning_rate': 2.152910001479638e-06, 'epoch': 0.79} {'loss': 0.7212, 'learning_rate': 2.1490492456905964e-06, 'epoch': 0.79} {'loss': 0.7656, 'learning_rate': 2.1451915379109546e-06, 'epoch': 0.79} {'loss': 0.7876, 'learning_rate': 2.141336879638406e-06, 'epoch': 0.79} {'loss': 0.771, 'learning_rate': 2.1374852723694595e-06, 'epoch': 0.79} {'loss': 0.7627, 'learning_rate': 2.133636717599451e-06, 'epoch': 0.79} {'loss': 0.7017, 'learning_rate': 2.1297912168225086e-06, 'epoch': 0.79} {'loss': 0.708, 'learning_rate': 2.1259487715316e-06, 'epoch': 0.79} {'loss': 0.7188, 'learning_rate': 2.1221093832184903e-06, 'epoch': 0.8} {'loss': 0.7437, 'learning_rate': 2.118273053373757e-06, 'epoch': 0.8} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/073970477X.jpg' {'loss': 0.7441, 'learning_rate': 2.1144397834868034e-06, 'epoch': 0.8} {'loss': 0.752, 'learning_rate': 2.1106095750458332e-06, 'epoch': 0.8} {'loss': 0.769, 'learning_rate': 2.106782429537866e-06, 'epoch': 0.8} {'loss': 0.7788, 'learning_rate': 2.1029583484487315e-06, 'epoch': 0.8} {'loss': 0.7812, 'learning_rate': 2.0991373332630683e-06, 'epoch': 0.8} {'loss': 0.7686, 'learning_rate': 2.0953193854643274e-06, 'epoch': 0.8} {'loss': 0.7478, 'learning_rate': 2.0915045065347673e-06, 'epoch': 0.8} {'loss': 0.7798, 'learning_rate': 2.0876926979554545e-06, 'epoch': 0.8} {'loss': 0.7485, 'learning_rate': 2.0838839612062633e-06, 'epoch': 0.8} {'loss': 0.75, 'learning_rate': 2.080078297765884e-06, 'epoch': 0.8} {'loss': 0.7832, 'learning_rate': 2.0762757091117937e-06, 'epoch': 0.8} {'loss': 0.7568, 'learning_rate': 2.0724761967202987e-06, 'epoch': 0.8} {'loss': 0.7285, 'learning_rate': 2.0686797620664987e-06, 'epoch': 0.8} {'loss': 0.813, 'learning_rate': 2.0648864066242937e-06, 'epoch': 0.8} {'loss': 0.2665, 'learning_rate': 2.0610961318664013e-06, 'epoch': 0.8} {'loss': 0.7627, 'learning_rate': 2.0573089392643362e-06, 'epoch': 0.8} {'loss': 0.7959, 'learning_rate': 2.0535248302884147e-06, 'epoch': 0.8} {'loss': 0.6836, 'learning_rate': 2.0497438064077603e-06, 'epoch': 0.8} {'loss': 0.7588, 'learning_rate': 2.045965869090295e-06, 'epoch': 0.8} {'loss': 0.7803, 'learning_rate': 2.0421910198027452e-06, 'epoch': 0.8} {'loss': 0.7852, 'learning_rate': 2.0384192600106335e-06, 'epoch': 0.8} {'loss': 0.7939, 'learning_rate': 2.0346505911782956e-06, 'epoch': 0.8} {'loss': 0.7275, 'learning_rate': 2.0308850147688484e-06, 'epoch': 0.8} {'loss': 0.7598, 'learning_rate': 2.0271225322442255e-06, 'epoch': 0.8} {'loss': 0.7803, 'learning_rate': 2.0233631450651525e-06, 'epoch': 0.8} {'loss': 0.7734, 'learning_rate': 2.019606854691145e-06, 'epoch': 0.8} {'loss': 0.7749, 'learning_rate': 2.0158536625805325e-06, 'epoch': 0.8} {'loss': 0.7339, 'learning_rate': 2.01210357019043e-06, 'epoch': 0.8} {'loss': 0.7642, 'learning_rate': 2.008356578976752e-06, 'epoch': 0.8} {'loss': 0.769, 'learning_rate': 2.004612690394212e-06, 'epoch': 0.8} [2024-01-31 13:20:30,736] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8013, 'learning_rate': 2.0008719058963144e-06, 'epoch': 0.8} {'loss': 0.7671, 'learning_rate': 1.997134226935361e-06, 'epoch': 0.8} {'loss': 0.7979, 'learning_rate': 1.9933996549624468e-06, 'epoch': 0.8} {'loss': 0.8208, 'learning_rate': 1.9896681914274616e-06, 'epoch': 0.8} {'loss': 0.8193, 'learning_rate': 1.9859398377790872e-06, 'epoch': 0.8} {'loss': 0.7876, 'learning_rate': 1.982214595464804e-06, 'epoch': 0.8} {'loss': 0.8159, 'learning_rate': 1.97849246593087e-06, 'epoch': 0.8} {'loss': 0.7964, 'learning_rate': 1.9747734506223525e-06, 'epoch': 0.8} {'loss': 0.8047, 'learning_rate': 1.9710575509831008e-06, 'epoch': 0.8} {'loss': 0.7168, 'learning_rate': 1.967344768455747e-06, 'epoch': 0.8} {'loss': 0.7151, 'learning_rate': 1.9636351044817292e-06, 'epoch': 0.8} {'loss': 0.7759, 'learning_rate': 1.9599285605012643e-06, 'epoch': 0.8} {'loss': 0.2457, 'learning_rate': 1.9562251379533593e-06, 'epoch': 0.8} {'loss': 0.8076, 'learning_rate': 1.952524838275811e-06, 'epoch': 0.8} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/139642625.jpg' [2024-01-31 13:24:58,942] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7974, 'learning_rate': 1.9488276629052026e-06, 'epoch': 0.8} {'loss': 0.7539, 'learning_rate': 1.945133613276907e-06, 'epoch': 0.8} {'loss': 0.77, 'learning_rate': 1.941442690825076e-06, 'epoch': 0.8} {'loss': 0.769, 'learning_rate': 1.937754896982663e-06, 'epoch': 0.8} {'loss': 0.7183, 'learning_rate': 1.9340702331813842e-06, 'epoch': 0.8} {'loss': 0.833, 'learning_rate': 1.9303887008517618e-06, 'epoch': 0.8} {'loss': 0.7637, 'learning_rate': 1.9267103014230935e-06, 'epoch': 0.81} [2024-01-31 13:27:07,306] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.688, 'learning_rate': 1.923035036323452e-06, 'epoch': 0.81} [2024-01-31 13:27:26,261] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7227, 'learning_rate': 1.91936290697971e-06, 'epoch': 0.81} {'loss': 0.7217, 'learning_rate': 1.9156939148175125e-06, 'epoch': 0.81} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739715593.jpg' [2024-01-31 13:28:03,834] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.748, 'learning_rate': 1.9120280612612873e-06, 'epoch': 0.81} [2024-01-31 13:28:31,785] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7012, 'learning_rate': 1.9083653477342467e-06, 'epoch': 0.81} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/835607240.jpg' [2024-01-31 13:28:57,403] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.6919, 'learning_rate': 1.904705775658381e-06, 'epoch': 0.81} [2024-01-31 13:29:15,452] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8159, 'learning_rate': 1.9010493464544621e-06, 'epoch': 0.81} [2024-01-31 13:29:41,284] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7231, 'learning_rate': 1.8973960615420416e-06, 'epoch': 0.81} [2024-01-31 13:29:59,928] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2537, 'learning_rate': 1.8937459223394517e-06, 'epoch': 0.81} [2024-01-31 13:30:21,184] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.6865, 'learning_rate': 1.8900989302637985e-06, 'epoch': 0.81} {'loss': 0.7075, 'learning_rate': 1.8864550867309771e-06, 'epoch': 0.81} {'loss': 0.7339, 'learning_rate': 1.8828143931556442e-06, 'epoch': 0.81} [2024-01-31 13:31:16,658] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8057, 'learning_rate': 1.8791768509512487e-06, 'epoch': 0.81} {'loss': 0.7725, 'learning_rate': 1.875542461530011e-06, 'epoch': 0.81} [2024-01-31 13:31:52,506] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7729, 'learning_rate': 1.871911226302917e-06, 'epoch': 0.81} [2024-01-31 13:32:09,799] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.793, 'learning_rate': 1.868283146679747e-06, 'epoch': 0.81} {'loss': 0.7393, 'learning_rate': 1.8646582240690414e-06, 'epoch': 0.81} {'loss': 0.7812, 'learning_rate': 1.8610364598781227e-06, 'epoch': 0.81} {'loss': 0.7007, 'learning_rate': 1.8574178555130818e-06, 'epoch': 0.81} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/761522751.jpg' {'loss': 0.7476, 'learning_rate': 1.8538024123787868e-06, 'epoch': 0.81} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B007K53FQ4.jpg' {'loss': 0.7407, 'learning_rate': 1.8501901318788773e-06, 'epoch': 0.81} {'loss': 0.6812, 'learning_rate': 1.8465810154157626e-06, 'epoch': 0.81} {'loss': 0.7617, 'learning_rate': 1.8429750643906331e-06, 'epoch': 0.81} [2024-01-31 13:34:41,453] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7275, 'learning_rate': 1.8393722802034331e-06, 'epoch': 0.81} [2024-01-31 13:35:00,288] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7412, 'learning_rate': 1.835772664252895e-06, 'epoch': 0.81} [2024-01-31 13:35:17,853] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7505, 'learning_rate': 1.832176217936511e-06, 'epoch': 0.81} {'loss': 0.7207, 'learning_rate': 1.8285829426505453e-06, 'epoch': 0.81} {'loss': 0.7637, 'learning_rate': 1.8249928397900351e-06, 'epoch': 0.81} {'loss': 0.7451, 'learning_rate': 1.8214059107487726e-06, 'epoch': 0.81} {'loss': 0.7778, 'learning_rate': 1.8178221569193343e-06, 'epoch': 0.81} {'loss': 0.7783, 'learning_rate': 1.8142415796930568e-06, 'epoch': 0.81} {'loss': 0.7812, 'learning_rate': 1.8106641804600411e-06, 'epoch': 0.81} {'loss': 0.7715, 'learning_rate': 1.8070899606091586e-06, 'epoch': 0.81} {'loss': 0.7466, 'learning_rate': 1.8035189215280423e-06, 'epoch': 0.81} {'loss': 0.7563, 'learning_rate': 1.799951064603095e-06, 'epoch': 0.81} {'loss': 0.7344, 'learning_rate': 1.7963863912194768e-06, 'epoch': 0.81} {'loss': 0.7368, 'learning_rate': 1.7928249027611255e-06, 'epoch': 0.81} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/201624508.jpg' {'loss': 0.6987, 'learning_rate': 1.789266600610724e-06, 'epoch': 0.81} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/688121675.jpg' {'loss': 0.7617, 'learning_rate': 1.7857114861497337e-06, 'epoch': 0.81} {'loss': 0.792, 'learning_rate': 1.782159560758373e-06, 'epoch': 0.81} {'loss': 0.7646, 'learning_rate': 1.7786108258156154e-06, 'epoch': 0.81} [2024-01-31 13:40:07,490] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7275, 'learning_rate': 1.7750652826992077e-06, 'epoch': 0.81} {'loss': 0.7729, 'learning_rate': 1.7715229327856498e-06, 'epoch': 0.81} {'loss': 0.7114, 'learning_rate': 1.7679837774502052e-06, 'epoch': 0.81} {'loss': 0.7607, 'learning_rate': 1.7644478180668945e-06, 'epoch': 0.81} {'loss': 0.7744, 'learning_rate': 1.7609150560084986e-06, 'epoch': 0.81} {'loss': 0.7344, 'learning_rate': 1.7573854926465582e-06, 'epoch': 0.81} {'loss': 0.709, 'learning_rate': 1.7538591293513685e-06, 'epoch': 0.81} {'loss': 0.7949, 'learning_rate': 1.7503359674919929e-06, 'epoch': 0.81} {'loss': 0.7485, 'learning_rate': 1.746816008436234e-06, 'epoch': 0.81} {'loss': 0.7363, 'learning_rate': 1.7432992535506687e-06, 'epoch': 0.81} {'loss': 0.7612, 'learning_rate': 1.7397857042006194e-06, 'epoch': 0.82} {'loss': 0.7773, 'learning_rate': 1.736275361750167e-06, 'epoch': 0.82} {'loss': 0.8052, 'learning_rate': 1.7327682275621506e-06, 'epoch': 0.82} {'loss': 0.79, 'learning_rate': 1.7292643029981525e-06, 'epoch': 0.82} {'loss': 0.7764, 'learning_rate': 1.7257635894185232e-06, 'epoch': 0.82} {'loss': 0.7339, 'learning_rate': 1.7222660881823594e-06, 'epoch': 0.82} {'loss': 0.7505, 'learning_rate': 1.7187718006475117e-06, 'epoch': 0.82} {'loss': 0.7153, 'learning_rate': 1.7152807281705809e-06, 'epoch': 0.82} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/093062596X.jpg' {'loss': 0.7534, 'learning_rate': 1.7117928721069233e-06, 'epoch': 0.82} {'loss': 0.7725, 'learning_rate': 1.708308233810644e-06, 'epoch': 0.82} {'loss': 0.7109, 'learning_rate': 1.704826814634597e-06, 'epoch': 0.82} {'loss': 0.7603, 'learning_rate': 1.701348615930397e-06, 'epoch': 0.82} {'loss': 0.7515, 'learning_rate': 1.6978736390483896e-06, 'epoch': 0.82} {'loss': 0.6614, 'learning_rate': 1.6944018853376898e-06, 'epoch': 0.82} {'loss': 0.7075, 'learning_rate': 1.6909333561461471e-06, 'epoch': 0.82} {'loss': 0.7866, 'learning_rate': 1.6874680528203657e-06, 'epoch': 0.82} {'loss': 0.7583, 'learning_rate': 1.6840059767056949e-06, 'epoch': 0.82} {'loss': 0.7466, 'learning_rate': 1.6805471291462316e-06, 'epoch': 0.82} {'loss': 0.7935, 'learning_rate': 1.6770915114848197e-06, 'epoch': 0.82} {'loss': 0.7324, 'learning_rate': 1.67363912506305e-06, 'epoch': 0.82} {'loss': 0.7983, 'learning_rate': 1.6701899712212565e-06, 'epoch': 0.82} {'loss': 0.7959, 'learning_rate': 1.66674405129852e-06, 'epoch': 0.82} {'loss': 0.7607, 'learning_rate': 1.6633013666326636e-06, 'epoch': 0.82} {'loss': 0.7061, 'learning_rate': 1.6598619185602616e-06, 'epoch': 0.82} {'loss': 0.2504, 'learning_rate': 1.656425708416617e-06, 'epoch': 0.82} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/786884061.jpg' {'loss': 0.7456, 'learning_rate': 1.6529927375357957e-06, 'epoch': 0.82} {'loss': 0.7769, 'learning_rate': 1.6495630072505841e-06, 'epoch': 0.82} {'loss': 0.7471, 'learning_rate': 1.6461365188925304e-06, 'epoch': 0.82} {'loss': 0.7588, 'learning_rate': 1.642713273791914e-06, 'epoch': 0.82} {'loss': 0.771, 'learning_rate': 1.6392932732777489e-06, 'epoch': 0.82} {'loss': 0.8149, 'learning_rate': 1.6358765186778057e-06, 'epoch': 0.82} {'loss': 0.7456, 'learning_rate': 1.6324630113185835e-06, 'epoch': 0.82} {'loss': 0.7529, 'learning_rate': 1.629052752525323e-06, 'epoch': 0.82} {'loss': 0.8022, 'learning_rate': 1.625645743622003e-06, 'epoch': 0.82} {'loss': 0.7803, 'learning_rate': 1.6222419859313443e-06, 'epoch': 0.82} {'loss': 0.7832, 'learning_rate': 1.6188414807747999e-06, 'epoch': 0.82} {'loss': 0.7539, 'learning_rate': 1.6154442294725636e-06, 'epoch': 0.82} {'loss': 0.2471, 'learning_rate': 1.6120502333435695e-06, 'epoch': 0.82} {'loss': 0.75, 'learning_rate': 1.6086594937054767e-06, 'epoch': 0.82} {'loss': 0.7632, 'learning_rate': 1.6052720118746923e-06, 'epoch': 0.82} {'loss': 0.7495, 'learning_rate': 1.6018877891663521e-06, 'epoch': 0.82} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/749517735.jpg' {'loss': 0.7354, 'learning_rate': 1.5985068268943283e-06, 'epoch': 0.82} {'loss': 0.7656, 'learning_rate': 1.5951291263712255e-06, 'epoch': 0.82} {'loss': 0.7666, 'learning_rate': 1.5917546889083834e-06, 'epoch': 0.82} {'loss': 0.7329, 'learning_rate': 1.5883835158158767e-06, 'epoch': 0.82} {'loss': 0.7778, 'learning_rate': 1.5850156084025091e-06, 'epoch': 0.82} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/717281434.jpg' {'loss': 0.8018, 'learning_rate': 1.5816509679758185e-06, 'epoch': 0.82} {'loss': 0.7131, 'learning_rate': 1.578289595842074e-06, 'epoch': 0.82} {'loss': 0.8013, 'learning_rate': 1.5749314933062754e-06, 'epoch': 0.82} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/810928949.jpg' {'loss': 0.7393, 'learning_rate': 1.5715766616721584e-06, 'epoch': 0.82} {'loss': 0.7642, 'learning_rate': 1.5682251022421757e-06, 'epoch': 0.82} [2024-01-31 13:58:42,977] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7153, 'learning_rate': 1.5648768163175277e-06, 'epoch': 0.82} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/067173363X.jpg' [2024-01-31 13:59:01,822] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.2585, 'learning_rate': 1.5615318051981243e-06, 'epoch': 0.83} [2024-01-31 13:59:23,137] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7383, 'learning_rate': 1.5581900701826226e-06, 'epoch': 0.83} {'loss': 0.7778, 'learning_rate': 1.5548516125683976e-06, 'epoch': 0.83} {'loss': 0.7842, 'learning_rate': 1.5515164336515465e-06, 'epoch': 0.83} {'loss': 0.7583, 'learning_rate': 1.5481845347269077e-06, 'epoch': 0.83} {'loss': 0.7754, 'learning_rate': 1.5448559170880373e-06, 'epoch': 0.83} {'loss': 0.7783, 'learning_rate': 1.5415305820272198e-06, 'epoch': 0.83} {'loss': 0.7485, 'learning_rate': 1.5382085308354633e-06, 'epoch': 0.83} {'loss': 0.7896, 'learning_rate': 1.534889764802503e-06, 'epoch': 0.83} {'loss': 0.7407, 'learning_rate': 1.5315742852167992e-06, 'epoch': 0.83} {'loss': 0.7964, 'learning_rate': 1.5282620933655312e-06, 'epoch': 0.83} {'loss': 0.7998, 'learning_rate': 1.5249531905346138e-06, 'epoch': 0.83} {'loss': 0.7793, 'learning_rate': 1.521647578008667e-06, 'epoch': 0.83} {'loss': 0.7075, 'learning_rate': 1.5183452570710522e-06, 'epoch': 0.83} {'loss': 0.2534, 'learning_rate': 1.5150462290038392e-06, 'epoch': 0.83} {'loss': 0.7744, 'learning_rate': 1.511750495087827e-06, 'epoch': 0.83} {'loss': 0.8252, 'learning_rate': 1.5084580566025309e-06, 'epoch': 0.83} {'loss': 0.7935, 'learning_rate': 1.5051689148261895e-06, 'epoch': 0.83} {'loss': 0.6851, 'learning_rate': 1.5018830710357612e-06, 'epoch': 0.83} {'loss': 0.769, 'learning_rate': 1.4986005265069204e-06, 'epoch': 0.83} {'loss': 0.7593, 'learning_rate': 1.4953212825140728e-06, 'epoch': 0.83} {'loss': 0.7148, 'learning_rate': 1.4920453403303249e-06, 'epoch': 0.83} {'loss': 0.2587, 'learning_rate': 1.4887727012275112e-06, 'epoch': 0.83} {'loss': 0.6392, 'learning_rate': 1.4855033664761898e-06, 'epoch': 0.83} {'loss': 0.7725, 'learning_rate': 1.48223733734562e-06, 'epoch': 0.83} {'loss': 0.7393, 'learning_rate': 1.4789746151037942e-06, 'epoch': 0.83} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739715100.jpg' {'loss': 0.2446, 'learning_rate': 1.475715201017407e-06, 'epoch': 0.83} {'loss': 0.7397, 'learning_rate': 1.4724590963518803e-06, 'epoch': 0.83} {'loss': 0.7271, 'learning_rate': 1.4692063023713444e-06, 'epoch': 0.83} {'loss': 0.7529, 'learning_rate': 1.4659568203386464e-06, 'epoch': 0.83} {'loss': 0.7539, 'learning_rate': 1.4627106515153456e-06, 'epoch': 0.83} {'loss': 0.7812, 'learning_rate': 1.4594677971617178e-06, 'epoch': 0.83} {'loss': 0.7554, 'learning_rate': 1.4562282585367493e-06, 'epoch': 0.83} {'loss': 0.7192, 'learning_rate': 1.452992036898142e-06, 'epoch': 0.83} {'loss': 0.7065, 'learning_rate': 1.4497591335023087e-06, 'epoch': 0.83} {'loss': 0.7729, 'learning_rate': 1.446529549604373e-06, 'epoch': 0.83} {'loss': 0.811, 'learning_rate': 1.4433032864581687e-06, 'epoch': 0.83} {'loss': 0.7031, 'learning_rate': 1.4400803453162482e-06, 'epoch': 0.83} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/688116191.jpg' {'loss': 0.7915, 'learning_rate': 1.4368607274298596e-06, 'epoch': 0.83} {'loss': 0.2592, 'learning_rate': 1.4336444340489775e-06, 'epoch': 0.83} {'loss': 0.7461, 'learning_rate': 1.430431466422273e-06, 'epoch': 0.83} {'loss': 0.7651, 'learning_rate': 1.4272218257971327e-06, 'epoch': 0.83} {'loss': 0.7764, 'learning_rate': 1.4240155134196499e-06, 'epoch': 0.83} {'loss': 0.7021, 'learning_rate': 1.4208125305346232e-06, 'epoch': 0.83} {'loss': 0.7446, 'learning_rate': 1.4176128783855636e-06, 'epoch': 0.83} {'loss': 0.7559, 'learning_rate': 1.4144165582146819e-06, 'epoch': 0.83} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/188673206X.jpg' {'loss': 0.7305, 'learning_rate': 1.4112235712629063e-06, 'epoch': 0.83} {'loss': 0.6899, 'learning_rate': 1.40803391876986e-06, 'epoch': 0.83} {'loss': 0.7432, 'learning_rate': 1.4048476019738756e-06, 'epoch': 0.83} {'loss': 0.7334, 'learning_rate': 1.4016646221119912e-06, 'epoch': 0.83} {'loss': 0.73, 'learning_rate': 1.3984849804199485e-06, 'epoch': 0.83} {'loss': 0.8018, 'learning_rate': 1.395308678132199e-06, 'epoch': 0.83} {'loss': 0.7788, 'learning_rate': 1.392135716481885e-06, 'epoch': 0.84} {'loss': 0.7622, 'learning_rate': 1.3889660967008656e-06, 'epoch': 0.84} {'loss': 0.7646, 'learning_rate': 1.3857998200196943e-06, 'epoch': 0.84} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/78821185.jpg' {'loss': 0.7549, 'learning_rate': 1.3826368876676278e-06, 'epoch': 0.84} {'loss': 0.7578, 'learning_rate': 1.379477300872626e-06, 'epoch': 0.84} {'loss': 0.7383, 'learning_rate': 1.3763210608613497e-06, 'epoch': 0.84} {'loss': 0.7139, 'learning_rate': 1.3731681688591593e-06, 'epoch': 0.84} {'loss': 0.7241, 'learning_rate': 1.370018626090116e-06, 'epoch': 0.84} {'loss': 0.7166, 'learning_rate': 1.3668724337769823e-06, 'epoch': 0.84} {'loss': 0.7529, 'learning_rate': 1.3637295931412153e-06, 'epoch': 0.84} {'loss': 0.7461, 'learning_rate': 1.3605901054029746e-06, 'epoch': 0.84} {'loss': 0.7397, 'learning_rate': 1.3574539717811231e-06, 'epoch': 0.84} {'loss': 0.7891, 'learning_rate': 1.3543211934932065e-06, 'epoch': 0.84} {'loss': 0.7446, 'learning_rate': 1.3511917717554846e-06, 'epoch': 0.84} {'loss': 0.6953, 'learning_rate': 1.348065707782904e-06, 'epoch': 0.84} {'loss': 0.71, 'learning_rate': 1.3449430027891096e-06, 'epoch': 0.84} {'loss': 0.7607, 'learning_rate': 1.3418236579864452e-06, 'epoch': 0.84} {'loss': 0.8022, 'learning_rate': 1.338707674585945e-06, 'epoch': 0.84} {'loss': 0.7842, 'learning_rate': 1.3355950537973438e-06, 'epoch': 0.84} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1561700940.jpg' {'loss': 0.7583, 'learning_rate': 1.332485796829065e-06, 'epoch': 0.84} {'loss': 0.7925, 'learning_rate': 1.329379904888235e-06, 'epoch': 0.84} {'loss': 0.7002, 'learning_rate': 1.3262773791806617e-06, 'epoch': 0.84} {'loss': 0.7754, 'learning_rate': 1.3231782209108546e-06, 'epoch': 0.84} {'loss': 0.6938, 'learning_rate': 1.3200824312820137e-06, 'epoch': 0.84} {'loss': 0.6924, 'learning_rate': 1.3169900114960298e-06, 'epoch': 0.84} {'loss': 0.7661, 'learning_rate': 1.3139009627534927e-06, 'epoch': 0.84} {'loss': 0.8276, 'learning_rate': 1.3108152862536683e-06, 'epoch': 0.84} {'loss': 0.7632, 'learning_rate': 1.3077329831945295e-06, 'epoch': 0.84} {'loss': 0.7158, 'learning_rate': 1.3046540547727305e-06, 'epoch': 0.84} {'loss': 0.748, 'learning_rate': 1.3015785021836159e-06, 'epoch': 0.84} {'loss': 0.7388, 'learning_rate': 1.2985063266212229e-06, 'epoch': 0.84} {'loss': 0.7358, 'learning_rate': 1.295437529278275e-06, 'epoch': 0.84} {'loss': 0.7798, 'learning_rate': 1.2923721113461852e-06, 'epoch': 0.84} {'loss': 0.6777, 'learning_rate': 1.2893100740150522e-06, 'epoch': 0.84} {'loss': 0.7417, 'learning_rate': 1.2862514184736695e-06, 'epoch': 0.84} {'loss': 0.7544, 'learning_rate': 1.2831961459095088e-06, 'epoch': 0.84} {'loss': 0.7734, 'learning_rate': 1.2801442575087296e-06, 'epoch': 0.84} {'loss': 0.7534, 'learning_rate': 1.2770957544561868e-06, 'epoch': 0.84} {'loss': 0.7554, 'learning_rate': 1.274050637935408e-06, 'epoch': 0.84} {'loss': 0.7305, 'learning_rate': 1.2710089091286148e-06, 'epoch': 0.84} {'loss': 0.7358, 'learning_rate': 1.2679705692167122e-06, 'epoch': 0.84} {'loss': 0.8037, 'learning_rate': 1.2649356193792873e-06, 'epoch': 0.84} {'loss': 0.7773, 'learning_rate': 1.261904060794612e-06, 'epoch': 0.84} {'loss': 0.2566, 'learning_rate': 1.2588758946396417e-06, 'epoch': 0.84} {'loss': 0.7397, 'learning_rate': 1.2558511220900138e-06, 'epoch': 0.84} {'loss': 0.7646, 'learning_rate': 1.2528297443200489e-06, 'epoch': 0.84} {'loss': 0.7637, 'learning_rate': 1.2498117625027562e-06, 'epoch': 0.84} [2024-01-31 14:29:29,759] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7422, 'learning_rate': 1.246797177809812e-06, 'epoch': 0.84} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1560984503.jpg' [2024-01-31 14:29:47,551] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.728, 'learning_rate': 1.2437859914115847e-06, 'epoch': 0.84} {'loss': 0.7227, 'learning_rate': 1.2407782044771211e-06, 'epoch': 0.84} {'loss': 0.7231, 'learning_rate': 1.237773818174146e-06, 'epoch': 0.84} [2024-01-31 14:30:43,017] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7793, 'learning_rate': 1.23477283366907e-06, 'epoch': 0.84} {'loss': 0.7388, 'learning_rate': 1.2317752521269722e-06, 'epoch': 0.85} [2024-01-31 14:31:22,959] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7036, 'learning_rate': 1.2287810747116224e-06, 'epoch': 0.85} {'loss': 0.7817, 'learning_rate': 1.225790302585461e-06, 'epoch': 0.85} {'loss': 0.2627, 'learning_rate': 1.2228029369096094e-06, 'epoch': 0.85} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/832904651.jpg' {'loss': 0.6958, 'learning_rate': 1.2198189788438652e-06, 'epoch': 0.85} [2024-01-31 14:32:31,719] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7295, 'learning_rate': 1.216838429546704e-06, 'epoch': 0.85} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/966355903.jpg' {'loss': 0.7324, 'learning_rate': 1.2138612901752777e-06, 'epoch': 0.85} {'loss': 0.6675, 'learning_rate': 1.2108875618854122e-06, 'epoch': 0.85} WARNING: tokenization mismatch: 1 vs. 789. (ignored) {'loss': 0.7832, 'learning_rate': 1.2079172458316168e-06, 'epoch': 0.85} {'loss': 0.7886, 'learning_rate': 1.204950343167065e-06, 'epoch': 0.85} {'loss': 0.2574, 'learning_rate': 1.2019868550436099e-06, 'epoch': 0.85} {'loss': 0.7173, 'learning_rate': 1.1990267826117874e-06, 'epoch': 0.85} {'loss': 0.7437, 'learning_rate': 1.1960701270207885e-06, 'epoch': 0.85} {'loss': 0.7178, 'learning_rate': 1.1931168894184974e-06, 'epoch': 0.85} {'loss': 0.7344, 'learning_rate': 1.19016707095146e-06, 'epoch': 0.85} {'loss': 0.7656, 'learning_rate': 1.187220672764897e-06, 'epoch': 0.85} {'loss': 0.2483, 'learning_rate': 1.1842776960027014e-06, 'epoch': 0.85} {'loss': 0.2607, 'learning_rate': 1.1813381418074388e-06, 'epoch': 0.85} {'loss': 0.7676, 'learning_rate': 1.1784020113203453e-06, 'epoch': 0.85} {'loss': 0.8042, 'learning_rate': 1.1754693056813272e-06, 'epoch': 0.85} {'loss': 0.77, 'learning_rate': 1.172540026028962e-06, 'epoch': 0.85} {'loss': 0.73, 'learning_rate': 1.169614173500494e-06, 'epoch': 0.85} {'loss': 0.7344, 'learning_rate': 1.1666917492318486e-06, 'epoch': 0.85} [2024-01-31 14:38:13,610] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7017, 'learning_rate': 1.1637727543576027e-06, 'epoch': 0.85} {'loss': 0.7822, 'learning_rate': 1.1608571900110122e-06, 'epoch': 0.85} {'loss': 0.7539, 'learning_rate': 1.1579450573240058e-06, 'epoch': 0.85} {'loss': 0.7593, 'learning_rate': 1.1550363574271638e-06, 'epoch': 0.85} {'loss': 0.7173, 'learning_rate': 1.1521310914497518e-06, 'epoch': 0.85} {'loss': 0.7583, 'learning_rate': 1.149229260519691e-06, 'epoch': 0.85} {'loss': 0.811, 'learning_rate': 1.1463308657635718e-06, 'epoch': 0.85} {'loss': 0.7544, 'learning_rate': 1.1434359083066515e-06, 'epoch': 0.85} {'loss': 0.7681, 'learning_rate': 1.140544389272853e-06, 'epoch': 0.85} {'loss': 0.7285, 'learning_rate': 1.1376563097847616e-06, 'epoch': 0.85} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1555951120.jpg' {'loss': 0.7305, 'learning_rate': 1.1347716709636282e-06, 'epoch': 0.85} {'loss': 0.7451, 'learning_rate': 1.1318904739293745e-06, 'epoch': 0.85} {'loss': 0.7798, 'learning_rate': 1.129012719800575e-06, 'epoch': 0.85} {'loss': 0.7231, 'learning_rate': 1.1261384096944728e-06, 'epoch': 0.85} {'loss': 0.7261, 'learning_rate': 1.1232675447269803e-06, 'epoch': 0.85} {'loss': 0.7959, 'learning_rate': 1.1204001260126574e-06, 'epoch': 0.85} {'loss': 0.7617, 'learning_rate': 1.1175361546647413e-06, 'epoch': 0.85} {'loss': 0.7817, 'learning_rate': 1.1146756317951224e-06, 'epoch': 0.85} {'loss': 0.8032, 'learning_rate': 1.1118185585143536e-06, 'epoch': 0.85} {'loss': 0.7363, 'learning_rate': 1.1089649359316501e-06, 'epoch': 0.85} {'loss': 0.791, 'learning_rate': 1.1061147651548855e-06, 'epoch': 0.85} {'loss': 0.8008, 'learning_rate': 1.1032680472905932e-06, 'epoch': 0.85} {'loss': 0.7334, 'learning_rate': 1.1004247834439697e-06, 'epoch': 0.85} {'loss': 0.2422, 'learning_rate': 1.097584974718866e-06, 'epoch': 0.85} {'loss': 0.7373, 'learning_rate': 1.0947486222177928e-06, 'epoch': 0.85} {'loss': 0.2542, 'learning_rate': 1.0919157270419257e-06, 'epoch': 0.85} {'loss': 0.707, 'learning_rate': 1.0890862902910849e-06, 'epoch': 0.85} {'loss': 0.7324, 'learning_rate': 1.0862603130637562e-06, 'epoch': 0.85} {'loss': 0.7847, 'learning_rate': 1.0834377964570863e-06, 'epoch': 0.85} [2024-01-31 14:47:22,124] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7593, 'learning_rate': 1.0806187415668668e-06, 'epoch': 0.86} {'loss': 0.7979, 'learning_rate': 1.0778031494875574e-06, 'epoch': 0.86} {'loss': 0.729, 'learning_rate': 1.0749910213122649e-06, 'epoch': 0.86} {'loss': 0.7812, 'learning_rate': 1.072182358132755e-06, 'epoch': 0.86} {'loss': 0.7324, 'learning_rate': 1.0693771610394477e-06, 'epoch': 0.86} {'loss': 0.7642, 'learning_rate': 1.066575431121417e-06, 'epoch': 0.86} {'loss': 0.7178, 'learning_rate': 1.06377716946639e-06, 'epoch': 0.86} {'loss': 0.7656, 'learning_rate': 1.0609823771607487e-06, 'epoch': 0.86} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/876043082.jpg' {'loss': 0.814, 'learning_rate': 1.0581910552895302e-06, 'epoch': 0.86} {'loss': 0.7188, 'learning_rate': 1.055403204936416e-06, 'epoch': 0.86} {'loss': 0.7871, 'learning_rate': 1.0526188271837512e-06, 'epoch': 0.86} {'loss': 0.769, 'learning_rate': 1.0498379231125278e-06, 'epoch': 0.86} {'loss': 0.7046, 'learning_rate': 1.047060493802381e-06, 'epoch': 0.86} {'loss': 0.752, 'learning_rate': 1.0442865403316117e-06, 'epoch': 0.86} {'loss': 0.7163, 'learning_rate': 1.0415160637771604e-06, 'epoch': 0.86} {'loss': 0.7354, 'learning_rate': 1.0387490652146236e-06, 'epoch': 0.86} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1570761116.jpg' {'loss': 0.7568, 'learning_rate': 1.0359855457182455e-06, 'epoch': 0.86} {'loss': 0.7651, 'learning_rate': 1.0332255063609177e-06, 'epoch': 0.86} {'loss': 0.7681, 'learning_rate': 1.0304689482141839e-06, 'epoch': 0.86} {'loss': 0.7329, 'learning_rate': 1.027715872348234e-06, 'epoch': 0.86} {'loss': 0.7788, 'learning_rate': 1.0249662798319072e-06, 'epoch': 0.86} {'loss': 0.2332, 'learning_rate': 1.0222201717326885e-06, 'epoch': 0.86} {'loss': 0.7656, 'learning_rate': 1.0194775491167164e-06, 'epoch': 0.86} {'loss': 0.8013, 'learning_rate': 1.0167384130487667e-06, 'epoch': 0.86} {'loss': 0.8545, 'learning_rate': 1.0140027645922656e-06, 'epoch': 0.86} {'loss': 0.7534, 'learning_rate': 1.0112706048092924e-06, 'epoch': 0.86} {'loss': 0.7671, 'learning_rate': 1.0085419347605575e-06, 'epoch': 0.86} {'loss': 0.707, 'learning_rate': 1.00581675550543e-06, 'epoch': 0.86} {'loss': 0.7339, 'learning_rate': 1.003095068101917e-06, 'epoch': 0.86} {'loss': 0.7471, 'learning_rate': 1.0003768736066722e-06, 'epoch': 0.86} {'loss': 0.7568, 'learning_rate': 9.976621730749892e-07, 'epoch': 0.86} {'loss': 0.7759, 'learning_rate': 9.949509675608115e-07, 'epoch': 0.86} {'loss': 0.7163, 'learning_rate': 9.922432581167207e-07, 'epoch': 0.86} {'loss': 0.7261, 'learning_rate': 9.895390457939414e-07, 'epoch': 0.86} {'loss': 0.8228, 'learning_rate': 9.86838331642348e-07, 'epoch': 0.86} {'loss': 0.7588, 'learning_rate': 9.84141116710442e-07, 'epoch': 0.86} {'loss': 0.7852, 'learning_rate': 9.814474020453824e-07, 'epoch': 0.86} {'loss': 0.8076, 'learning_rate': 9.787571886929604e-07, 'epoch': 0.86} {'loss': 0.7461, 'learning_rate': 9.76070477697605e-07, 'epoch': 0.86} {'loss': 0.7505, 'learning_rate': 9.733872701023938e-07, 'epoch': 0.86} {'loss': 0.7993, 'learning_rate': 9.707075669490407e-07, 'epoch': 0.86} {'loss': 0.7871, 'learning_rate': 9.680313692778976e-07, 'epoch': 0.86} [2024-01-31 15:00:20,508] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7266, 'learning_rate': 9.653586781279567e-07, 'epoch': 0.86} [2024-01-31 15:00:38,868] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7275, 'learning_rate': 9.626894945368492e-07, 'epoch': 0.86} {'loss': 0.7954, 'learning_rate': 9.600238195408428e-07, 'epoch': 0.86} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1562613480.jpg' {'loss': 0.7222, 'learning_rate': 9.573616541748464e-07, 'epoch': 0.86} [2024-01-31 15:01:33,699] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.751, 'learning_rate': 9.547029994724023e-07, 'epoch': 0.86} [2024-01-31 15:01:57,209] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7695, 'learning_rate': 9.520478564656898e-07, 'epoch': 0.86} [2024-01-31 15:02:20,272] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7812, 'learning_rate': 9.49396226185535e-07, 'epoch': 0.86} {'loss': 0.2537, 'learning_rate': 9.467481096613829e-07, 'epoch': 0.86} {'loss': 0.7256, 'learning_rate': 9.441035079213267e-07, 'epoch': 0.86} {'loss': 0.7939, 'learning_rate': 9.414624219920953e-07, 'epoch': 0.86} {'loss': 0.7622, 'learning_rate': 9.38824852899043e-07, 'epoch': 0.87} {'loss': 0.7061, 'learning_rate': 9.361908016661703e-07, 'epoch': 0.87} {'loss': 0.7754, 'learning_rate': 9.335602693161039e-07, 'epoch': 0.87} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1558216103.jpg' [2024-01-31 15:04:35,219] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7373, 'learning_rate': 9.309332568701079e-07, 'epoch': 0.87} [2024-01-31 15:05:03,185] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7002, 'learning_rate': 9.283097653480788e-07, 'epoch': 0.87} {'loss': 0.7607, 'learning_rate': 9.256897957685463e-07, 'epoch': 0.87} {'loss': 0.7544, 'learning_rate': 9.230733491486721e-07, 'epoch': 0.87} {'loss': 0.7651, 'learning_rate': 9.204604265042505e-07, 'epoch': 0.87} {'loss': 0.7471, 'learning_rate': 9.178510288497123e-07, 'epoch': 0.87} {'loss': 0.7769, 'learning_rate': 9.15245157198108e-07, 'epoch': 0.87} {'loss': 0.7598, 'learning_rate': 9.126428125611342e-07, 'epoch': 0.87} {'loss': 0.2458, 'learning_rate': 9.10043995949108e-07, 'epoch': 0.87} {'loss': 0.7676, 'learning_rate': 9.074487083709759e-07, 'epoch': 0.87} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/912111364.jpg' {'loss': 0.7393, 'learning_rate': 9.04856950834323e-07, 'epoch': 0.87} {'loss': 0.7202, 'learning_rate': 9.022687243453554e-07, 'epoch': 0.87} {'loss': 0.7568, 'learning_rate': 8.996840299089149e-07, 'epoch': 0.87} {'loss': 0.7744, 'learning_rate': 8.971028685284655e-07, 'epoch': 0.87} {'loss': 0.7046, 'learning_rate': 8.945252412061056e-07, 'epoch': 0.87} {'loss': 0.793, 'learning_rate': 8.91951148942557e-07, 'epoch': 0.87} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/823412385.jpg' {'loss': 0.7402, 'learning_rate': 8.893805927371724e-07, 'epoch': 0.87} {'loss': 0.7148, 'learning_rate': 8.868135735879291e-07, 'epoch': 0.87} {'loss': 0.7183, 'learning_rate': 8.842500924914299e-07, 'epoch': 0.87} {'loss': 0.749, 'learning_rate': 8.816901504429143e-07, 'epoch': 0.87} {'loss': 0.7197, 'learning_rate': 8.791337484362305e-07, 'epoch': 0.87} {'loss': 0.6812, 'learning_rate': 8.765808874638682e-07, 'epoch': 0.87} {'loss': 0.6931, 'learning_rate': 8.740315685169364e-07, 'epoch': 0.87} {'loss': 0.7324, 'learning_rate': 8.714857925851617e-07, 'epoch': 0.87} {'loss': 0.7554, 'learning_rate': 8.689435606569086e-07, 'epoch': 0.87} {'loss': 0.7583, 'learning_rate': 8.664048737191566e-07, 'epoch': 0.87} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1559920734.jpg' {'loss': 0.7451, 'learning_rate': 8.638697327575108e-07, 'epoch': 0.87} {'loss': 0.7188, 'learning_rate': 8.613381387562015e-07, 'epoch': 0.87} {'loss': 0.7603, 'learning_rate': 8.588100926980802e-07, 'epoch': 0.87} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/553069403.jpg' {'loss': 0.7144, 'learning_rate': 8.56285595564621e-07, 'epoch': 0.87} {'loss': 0.7432, 'learning_rate': 8.537646483359185e-07, 'epoch': 0.87} {'loss': 0.7495, 'learning_rate': 8.512472519906978e-07, 'epoch': 0.87} {'loss': 0.7339, 'learning_rate': 8.487334075062914e-07, 'epoch': 0.87} {'loss': 0.7886, 'learning_rate': 8.462231158586654e-07, 'epoch': 0.87} {'loss': 0.7124, 'learning_rate': 8.437163780224011e-07, 'epoch': 0.87} {'loss': 0.8286, 'learning_rate': 8.412131949706958e-07, 'epoch': 0.87} {'loss': 0.7515, 'learning_rate': 8.387135676753755e-07, 'epoch': 0.87} {'loss': 0.6985, 'learning_rate': 8.362174971068804e-07, 'epoch': 0.87} {'loss': 0.7637, 'learning_rate': 8.337249842342721e-07, 'epoch': 0.87} [2024-01-31 15:17:12,113] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7632, 'learning_rate': 8.312360300252287e-07, 'epoch': 0.87} [2024-01-31 15:17:31,114] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7578, 'learning_rate': 8.287506354460484e-07, 'epoch': 0.87} [2024-01-31 15:17:49,940] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7627, 'learning_rate': 8.26268801461646e-07, 'epoch': 0.87} {'loss': 0.7671, 'learning_rate': 8.237905290355563e-07, 'epoch': 0.87} [2024-01-31 15:18:30,416] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7515, 'learning_rate': 8.213158191299297e-07, 'epoch': 0.87} {'loss': 0.7817, 'learning_rate': 8.188446727055311e-07, 'epoch': 0.87} {'loss': 0.7124, 'learning_rate': 8.163770907217506e-07, 'epoch': 0.87} {'loss': 0.7368, 'learning_rate': 8.139130741365819e-07, 'epoch': 0.87} {'loss': 0.6936, 'learning_rate': 8.114526239066456e-07, 'epoch': 0.87} {'loss': 0.769, 'learning_rate': 8.08995740987173e-07, 'epoch': 0.87} {'loss': 0.7412, 'learning_rate': 8.065424263320054e-07, 'epoch': 0.88} {'loss': 0.7676, 'learning_rate': 8.040926808936112e-07, 'epoch': 0.88} {'loss': 0.7617, 'learning_rate': 8.016465056230616e-07, 'epoch': 0.88} {'loss': 0.7983, 'learning_rate': 7.99203901470047e-07, 'epoch': 0.88} {'loss': 0.7261, 'learning_rate': 7.967648693828712e-07, 'epoch': 0.88} {'loss': 0.7061, 'learning_rate': 7.943294103084487e-07, 'epoch': 0.88} {'loss': 0.7754, 'learning_rate': 7.9189752519231e-07, 'epoch': 0.88} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/377570177X.jpg' {'loss': 0.7373, 'learning_rate': 7.894692149785954e-07, 'epoch': 0.88} {'loss': 0.7495, 'learning_rate': 7.870444806100619e-07, 'epoch': 0.88} {'loss': 0.7876, 'learning_rate': 7.846233230280698e-07, 'epoch': 0.88} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/891960856.jpg' {'loss': 0.7402, 'learning_rate': 7.822057431725994e-07, 'epoch': 0.88} {'loss': 0.7427, 'learning_rate': 7.797917419822377e-07, 'epoch': 0.88} {'loss': 0.7505, 'learning_rate': 7.773813203941827e-07, 'epoch': 0.88} {'loss': 0.7764, 'learning_rate': 7.749744793442448e-07, 'epoch': 0.88} {'loss': 0.6885, 'learning_rate': 7.725712197668378e-07, 'epoch': 0.88} {'loss': 0.7085, 'learning_rate': 7.701715425949952e-07, 'epoch': 0.88} {'loss': 0.77, 'learning_rate': 7.677754487603517e-07, 'epoch': 0.88} {'loss': 0.7925, 'learning_rate': 7.653829391931533e-07, 'epoch': 0.88} {'loss': 0.7686, 'learning_rate': 7.629940148222559e-07, 'epoch': 0.88} [2024-01-31 15:26:07,713] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7495, 'learning_rate': 7.606086765751209e-07, 'epoch': 0.88} {'loss': 0.2544, 'learning_rate': 7.582269253778185e-07, 'epoch': 0.88} {'loss': 0.7588, 'learning_rate': 7.55848762155027e-07, 'epoch': 0.88} [2024-01-31 15:27:01,846] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7627, 'learning_rate': 7.534741878300333e-07, 'epoch': 0.88} {'loss': 0.7778, 'learning_rate': 7.511032033247256e-07, 'epoch': 0.88} [2024-01-31 15:27:37,051] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8096, 'learning_rate': 7.487358095596031e-07, 'epoch': 0.88} {'loss': 0.752, 'learning_rate': 7.463720074537728e-07, 'epoch': 0.88} [2024-01-31 15:28:18,168] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7412, 'learning_rate': 7.440117979249362e-07, 'epoch': 0.88} [2024-01-31 15:28:35,183] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7168, 'learning_rate': 7.416551818894158e-07, 'epoch': 0.88} {'loss': 0.7607, 'learning_rate': 7.393021602621264e-07, 'epoch': 0.88} [2024-01-31 15:29:19,167] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7622, 'learning_rate': 7.369527339565951e-07, 'epoch': 0.88} {'loss': 0.7676, 'learning_rate': 7.346069038849469e-07, 'epoch': 0.88} {'loss': 0.7598, 'learning_rate': 7.322646709579173e-07, 'epoch': 0.88} {'loss': 0.7173, 'learning_rate': 7.299260360848382e-07, 'epoch': 0.88} {'loss': 0.7212, 'learning_rate': 7.275910001736497e-07, 'epoch': 0.88} {'loss': 0.7939, 'learning_rate': 7.252595641308957e-07, 'epoch': 0.88} {'loss': 0.7583, 'learning_rate': 7.229317288617144e-07, 'epoch': 0.88} {'loss': 0.6997, 'learning_rate': 7.20607495269856e-07, 'epoch': 0.88} {'loss': 0.7568, 'learning_rate': 7.182868642576679e-07, 'epoch': 0.88} {'loss': 0.7856, 'learning_rate': 7.15969836726097e-07, 'epoch': 0.88} {'loss': 0.8179, 'learning_rate': 7.13656413574696e-07, 'epoch': 0.88} {'loss': 0.793, 'learning_rate': 7.113465957016097e-07, 'epoch': 0.88} {'loss': 0.7729, 'learning_rate': 7.090403840035942e-07, 'epoch': 0.88} {'loss': 0.7783, 'learning_rate': 7.067377793759999e-07, 'epoch': 0.88} {'loss': 0.7471, 'learning_rate': 7.044387827127752e-07, 'epoch': 0.88} {'loss': 0.7168, 'learning_rate': 7.021433949064704e-07, 'epoch': 0.88} {'loss': 0.7114, 'learning_rate': 6.99851616848235e-07, 'epoch': 0.88} {'loss': 0.7539, 'learning_rate': 6.975634494278149e-07, 'epoch': 0.88} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1570282358.jpg' {'loss': 0.7637, 'learning_rate': 6.952788935335541e-07, 'epoch': 0.88} {'loss': 0.7891, 'learning_rate': 6.92997950052402e-07, 'epoch': 0.88} {'loss': 0.7402, 'learning_rate': 6.907206198698912e-07, 'epoch': 0.88} {'loss': 0.6948, 'learning_rate': 6.884469038701646e-07, 'epoch': 0.88} {'loss': 0.7529, 'learning_rate': 6.861768029359595e-07, 'epoch': 0.88} {'loss': 0.7231, 'learning_rate': 6.839103179485995e-07, 'epoch': 0.89} {'loss': 0.7568, 'learning_rate': 6.816474497880177e-07, 'epoch': 0.89} {'loss': 0.272, 'learning_rate': 6.793881993327366e-07, 'epoch': 0.89} {'loss': 0.7441, 'learning_rate': 6.77132567459875e-07, 'epoch': 0.89} {'loss': 0.7388, 'learning_rate': 6.748805550451453e-07, 'epoch': 0.89} {'loss': 0.7651, 'learning_rate': 6.726321629628585e-07, 'epoch': 0.89} {'loss': 0.7427, 'learning_rate': 6.703873920859161e-07, 'epoch': 0.89} {'loss': 0.7607, 'learning_rate': 6.681462432858154e-07, 'epoch': 0.89} {'loss': 0.7476, 'learning_rate': 6.659087174326506e-07, 'epoch': 0.89} [2024-01-31 15:39:06,488] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.6599, 'learning_rate': 6.636748153951e-07, 'epoch': 0.89} [2024-01-31 15:39:25,128] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7544, 'learning_rate': 6.614445380404478e-07, 'epoch': 0.89} [2024-01-31 15:39:43,293] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.749, 'learning_rate': 6.592178862345622e-07, 'epoch': 0.89} {'loss': 0.7788, 'learning_rate': 6.569948608419041e-07, 'epoch': 0.89} {'loss': 0.666, 'learning_rate': 6.547754627255332e-07, 'epoch': 0.89} {'loss': 0.6929, 'learning_rate': 6.52559692747089e-07, 'epoch': 0.89} {'loss': 0.7563, 'learning_rate': 6.503475517668168e-07, 'epoch': 0.89} {'loss': 0.7529, 'learning_rate': 6.481390406435417e-07, 'epoch': 0.89} {'loss': 0.71, 'learning_rate': 6.459341602346858e-07, 'epoch': 0.89} {'loss': 0.7393, 'learning_rate': 6.437329113962576e-07, 'epoch': 0.89} {'loss': 0.7178, 'learning_rate': 6.415352949828601e-07, 'epoch': 0.89} {'loss': 0.7759, 'learning_rate': 6.393413118476821e-07, 'epoch': 0.89} {'loss': 0.7202, 'learning_rate': 6.371509628425021e-07, 'epoch': 0.89} {'loss': 0.7808, 'learning_rate': 6.349642488176943e-07, 'epoch': 0.89} {'loss': 0.7188, 'learning_rate': 6.327811706222097e-07, 'epoch': 0.89} {'loss': 0.7524, 'learning_rate': 6.306017291035981e-07, 'epoch': 0.89} {'loss': 0.7417, 'learning_rate': 6.284259251079939e-07, 'epoch': 0.89} {'loss': 0.7695, 'learning_rate': 6.262537594801177e-07, 'epoch': 0.89} {'loss': 0.79, 'learning_rate': 6.240852330632796e-07, 'epoch': 0.89} {'loss': 0.7383, 'learning_rate': 6.219203466993762e-07, 'epoch': 0.89} {'loss': 0.7349, 'learning_rate': 6.197591012288918e-07, 'epoch': 0.89} {'loss': 0.2578, 'learning_rate': 6.17601497490895e-07, 'epoch': 0.89} {'loss': 0.7808, 'learning_rate': 6.154475363230417e-07, 'epoch': 0.89} {'loss': 0.7314, 'learning_rate': 6.132972185615749e-07, 'epoch': 0.89} {'loss': 0.7485, 'learning_rate': 6.111505450413202e-07, 'epoch': 0.89} {'loss': 0.8066, 'learning_rate': 6.090075165956943e-07, 'epoch': 0.89} {'loss': 0.7231, 'learning_rate': 6.068681340566896e-07, 'epoch': 0.89} {'loss': 0.2852, 'learning_rate': 6.047323982548924e-07, 'epoch': 0.89} {'loss': 0.7407, 'learning_rate': 6.026003100194633e-07, 'epoch': 0.89} {'loss': 0.7427, 'learning_rate': 6.004718701781575e-07, 'epoch': 0.89} {'loss': 0.7466, 'learning_rate': 5.983470795573088e-07, 'epoch': 0.89} {'loss': 0.7646, 'learning_rate': 5.962259389818292e-07, 'epoch': 0.89} {'loss': 0.7354, 'learning_rate': 5.941084492752236e-07, 'epoch': 0.89} {'loss': 0.7451, 'learning_rate': 5.91994611259572e-07, 'epoch': 0.89} {'loss': 0.7085, 'learning_rate': 5.898844257555392e-07, 'epoch': 0.89} {'loss': 0.7031, 'learning_rate': 5.87777893582372e-07, 'epoch': 0.89} {'loss': 0.7178, 'learning_rate': 5.856750155578983e-07, 'epoch': 0.89} {'loss': 0.7246, 'learning_rate': 5.835757924985286e-07, 'epoch': 0.89} {'loss': 0.7451, 'learning_rate': 5.81480225219252e-07, 'epoch': 0.89} {'loss': 0.752, 'learning_rate': 5.793883145336443e-07, 'epoch': 0.89} {'loss': 0.7002, 'learning_rate': 5.773000612538505e-07, 'epoch': 0.89} {'loss': 0.7173, 'learning_rate': 5.752154661906085e-07, 'epoch': 0.89} {'loss': 0.7778, 'learning_rate': 5.731345301532265e-07, 'epoch': 0.89} {'loss': 0.7676, 'learning_rate': 5.710572539495962e-07, 'epoch': 0.9} {'loss': 0.8115, 'learning_rate': 5.68983638386188e-07, 'epoch': 0.9} {'loss': 0.7817, 'learning_rate': 5.669136842680512e-07, 'epoch': 0.9} {'loss': 0.7915, 'learning_rate': 5.648473923988129e-07, 'epoch': 0.9} {'loss': 0.7832, 'learning_rate': 5.627847635806771e-07, 'epoch': 0.9} {'loss': 0.7402, 'learning_rate': 5.607257986144321e-07, 'epoch': 0.9} {'loss': 0.7769, 'learning_rate': 5.58670498299434e-07, 'epoch': 0.9} {'loss': 0.7461, 'learning_rate': 5.566188634336212e-07, 'epoch': 0.9} {'loss': 0.7324, 'learning_rate': 5.545708948135142e-07, 'epoch': 0.9} {'loss': 0.238, 'learning_rate': 5.525265932341984e-07, 'epoch': 0.9} {'loss': 0.7681, 'learning_rate': 5.504859594893475e-07, 'epoch': 0.9} {'loss': 0.7627, 'learning_rate': 5.484489943712013e-07, 'epoch': 0.9} {'loss': 0.7778, 'learning_rate': 5.464156986705826e-07, 'epoch': 0.9} {'loss': 0.7549, 'learning_rate': 5.443860731768869e-07, 'epoch': 0.9} {'loss': 0.7339, 'learning_rate': 5.423601186780836e-07, 'epoch': 0.9} {'loss': 0.7734, 'learning_rate': 5.403378359607181e-07, 'epoch': 0.9} {'loss': 0.7681, 'learning_rate': 5.383192258099113e-07, 'epoch': 0.9} {'loss': 0.6877, 'learning_rate': 5.36304289009355e-07, 'epoch': 0.9} {'loss': 0.7388, 'learning_rate': 5.342930263413193e-07, 'epoch': 0.9} {'loss': 0.7207, 'learning_rate': 5.322854385866439e-07, 'epoch': 0.9} {'loss': 0.8032, 'learning_rate': 5.302815265247452e-07, 'epoch': 0.9} {'loss': 0.6724, 'learning_rate': 5.282812909336077e-07, 'epoch': 0.9} {'loss': 0.73, 'learning_rate': 5.262847325897968e-07, 'epoch': 0.9} {'loss': 0.7524, 'learning_rate': 5.242918522684392e-07, 'epoch': 0.9} {'loss': 0.7651, 'learning_rate': 5.22302650743245e-07, 'epoch': 0.9} {'loss': 0.7056, 'learning_rate': 5.203171287864872e-07, 'epoch': 0.9} {'loss': 0.7139, 'learning_rate': 5.183352871690162e-07, 'epoch': 0.9} {'loss': 0.6963, 'learning_rate': 5.163571266602485e-07, 'epoch': 0.9} [2024-01-31 16:00:51,668] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8115, 'learning_rate': 5.143826480281778e-07, 'epoch': 0.9} [2024-01-31 16:01:09,285] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7607, 'learning_rate': 5.124118520393606e-07, 'epoch': 0.9} {'loss': 0.7192, 'learning_rate': 5.104447394589295e-07, 'epoch': 0.9} {'loss': 0.6851, 'learning_rate': 5.084813110505871e-07, 'epoch': 0.9} {'loss': 0.2742, 'learning_rate': 5.065215675766023e-07, 'epoch': 0.9} {'loss': 0.769, 'learning_rate': 5.045655097978131e-07, 'epoch': 0.9} {'loss': 0.2548, 'learning_rate': 5.026131384736321e-07, 'epoch': 0.9} {'loss': 0.7178, 'learning_rate': 5.006644543620342e-07, 'epoch': 0.9} {'loss': 0.7837, 'learning_rate': 4.987194582195687e-07, 'epoch': 0.9} {'loss': 0.7207, 'learning_rate': 4.967781508013459e-07, 'epoch': 0.9} {'loss': 0.2308, 'learning_rate': 4.948405328610506e-07, 'epoch': 0.9} {'loss': 0.751, 'learning_rate': 4.929066051509346e-07, 'epoch': 0.9} {'loss': 0.7017, 'learning_rate': 4.909763684218116e-07, 'epoch': 0.9} {'loss': 0.7417, 'learning_rate': 4.890498234230689e-07, 'epoch': 0.9} {'loss': 0.7529, 'learning_rate': 4.871269709026561e-07, 'epoch': 0.9} {'loss': 0.7402, 'learning_rate': 4.852078116070902e-07, 'epoch': 0.9} [2024-01-31 16:05:54,099] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7158, 'learning_rate': 4.832923462814565e-07, 'epoch': 0.9} [2024-01-31 16:06:12,650] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7024, 'learning_rate': 4.813805756694035e-07, 'epoch': 0.9} {'loss': 0.7297, 'learning_rate': 4.794725005131462e-07, 'epoch': 0.9} {'loss': 0.7622, 'learning_rate': 4.775681215534656e-07, 'epoch': 0.9} {'loss': 0.7319, 'learning_rate': 4.7566743952970894e-07, 'epoch': 0.9} {'loss': 0.7681, 'learning_rate': 4.7377045517978173e-07, 'epoch': 0.9} {'loss': 0.2557, 'learning_rate': 4.7187716924016355e-07, 'epoch': 0.9} {'loss': 0.7625, 'learning_rate': 4.6998758244588995e-07, 'epoch': 0.9} {'loss': 0.7988, 'learning_rate': 4.6810169553056616e-07, 'epoch': 0.91} {'loss': 0.708, 'learning_rate': 4.662195092263566e-07, 'epoch': 0.91} {'loss': 0.2531, 'learning_rate': 4.643410242639912e-07, 'epoch': 0.91} {'loss': 0.7192, 'learning_rate': 4.6246624137276206e-07, 'epoch': 0.91} {'loss': 0.7407, 'learning_rate': 4.605951612805237e-07, 'epoch': 0.91} {'loss': 0.7178, 'learning_rate': 4.587277847136984e-07, 'epoch': 0.91} {'loss': 0.2301, 'learning_rate': 4.568641123972606e-07, 'epoch': 0.91} {'loss': 0.7812, 'learning_rate': 4.550041450547549e-07, 'epoch': 0.91} {'loss': 0.7725, 'learning_rate': 4.5314788340828365e-07, 'epoch': 0.91} {'loss': 0.8115, 'learning_rate': 4.512953281785104e-07, 'epoch': 0.91} {'loss': 0.7212, 'learning_rate': 4.494464800846654e-07, 'epoch': 0.91} {'loss': 0.7656, 'learning_rate': 4.476013398445289e-07, 'epoch': 0.91} {'loss': 0.7485, 'learning_rate': 4.4575990817445234e-07, 'epoch': 0.91} {'loss': 0.2394, 'learning_rate': 4.4392218578934164e-07, 'epoch': 0.91} {'loss': 0.7886, 'learning_rate': 4.4208817340266385e-07, 'epoch': 0.91} {'loss': 0.7065, 'learning_rate': 4.4025787172644495e-07, 'epoch': 0.91} {'loss': 0.8081, 'learning_rate': 4.384312814712721e-07, 'epoch': 0.91} {'loss': 0.6826, 'learning_rate': 4.366084033462914e-07, 'epoch': 0.91} {'loss': 0.6687, 'learning_rate': 4.3478923805920335e-07, 'epoch': 0.91} {'loss': 0.8281, 'learning_rate': 4.329737863162753e-07, 'epoch': 0.91} [2024-01-31 16:14:44,697] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.729, 'learning_rate': 4.311620488223256e-07, 'epoch': 0.91} [2024-01-31 16:15:01,472] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7271, 'learning_rate': 4.2935402628073166e-07, 'epoch': 0.91} {'loss': 0.7231, 'learning_rate': 4.27549719393433e-07, 'epoch': 0.91} {'loss': 0.7368, 'learning_rate': 4.2574912886092166e-07, 'epoch': 0.91} {'loss': 0.7695, 'learning_rate': 4.239522553822495e-07, 'epoch': 0.91} {'loss': 0.7368, 'learning_rate': 4.221590996550251e-07, 'epoch': 0.91} {'loss': 0.6985, 'learning_rate': 4.203696623754139e-07, 'epoch': 0.91} {'loss': 0.7959, 'learning_rate': 4.1858394423813563e-07, 'epoch': 0.91} {'loss': 0.7603, 'learning_rate': 4.1680194593646696e-07, 'epoch': 0.91} {'loss': 0.6902, 'learning_rate': 4.1502366816224327e-07, 'epoch': 0.91} {'loss': 0.729, 'learning_rate': 4.1324911160585014e-07, 'epoch': 0.91} {'loss': 0.7778, 'learning_rate': 4.1147827695623643e-07, 'epoch': 0.91} {'loss': 0.7554, 'learning_rate': 4.097111649008967e-07, 'epoch': 0.91} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/761511253.jpg' [2024-01-31 16:18:47,341] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8018, 'learning_rate': 4.0794777612588543e-07, 'epoch': 0.91} {'loss': 0.2402, 'learning_rate': 4.061881113158117e-07, 'epoch': 0.91} {'loss': 0.7207, 'learning_rate': 4.044321711538368e-07, 'epoch': 0.91} [2024-01-31 16:19:48,303] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7856, 'learning_rate': 4.02679956321681e-07, 'epoch': 0.91} {'loss': 0.7305, 'learning_rate': 4.00931467499609e-07, 'epoch': 0.91} {'loss': 0.7612, 'learning_rate': 3.9918670536644776e-07, 'epoch': 0.91} {'loss': 0.686, 'learning_rate': 3.974456705995733e-07, 'epoch': 0.91} {'loss': 0.7441, 'learning_rate': 3.9570836387491487e-07, 'epoch': 0.91} {'loss': 0.7134, 'learning_rate': 3.9397478586695513e-07, 'epoch': 0.91} {'loss': 0.7651, 'learning_rate': 3.9224493724872915e-07, 'epoch': 0.91} {'loss': 0.7505, 'learning_rate': 3.90518818691823e-07, 'epoch': 0.91} {'loss': 0.6807, 'learning_rate': 3.8879643086637384e-07, 'epoch': 0.91} [2024-01-31 16:22:44,293] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7734, 'learning_rate': 3.8707777444107697e-07, 'epoch': 0.91} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/875730434.jpg' [2024-01-31 16:23:02,250] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7749, 'learning_rate': 3.8536285008316854e-07, 'epoch': 0.91} {'loss': 0.7827, 'learning_rate': 3.8365165845844266e-07, 'epoch': 0.91} {'loss': 0.7437, 'learning_rate': 3.819442002312457e-07, 'epoch': 0.91} {'loss': 0.2594, 'learning_rate': 3.8024047606446736e-07, 'epoch': 0.91} {'loss': 0.7339, 'learning_rate': 3.785404866195552e-07, 'epoch': 0.91} {'loss': 0.7222, 'learning_rate': 3.768442325565036e-07, 'epoch': 0.91} {'loss': 0.7656, 'learning_rate': 3.751517145338546e-07, 'epoch': 0.92} {'loss': 0.8052, 'learning_rate': 3.7346293320870363e-07, 'epoch': 0.92} {'loss': 0.7153, 'learning_rate': 3.717778892366941e-07, 'epoch': 0.92} {'loss': 0.7397, 'learning_rate': 3.700965832720171e-07, 'epoch': 0.92} [2024-01-31 16:26:06,251] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7769, 'learning_rate': 3.684190159674117e-07, 'epoch': 0.92} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1563705176.jpg' {'loss': 0.7388, 'learning_rate': 3.6674518797417236e-07, 'epoch': 0.92} [2024-01-31 16:26:42,225] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.8013, 'learning_rate': 3.6507509994213155e-07, 'epoch': 0.92} {'loss': 0.7075, 'learning_rate': 3.6340875251967946e-07, 'epoch': 0.92} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1560523573.jpg' {'loss': 0.7241, 'learning_rate': 3.617461463537464e-07, 'epoch': 0.92} [2024-01-31 16:27:35,784] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7329, 'learning_rate': 3.6008728208981157e-07, 'epoch': 0.92} {'loss': 0.7544, 'learning_rate': 3.5843216037190873e-07, 'epoch': 0.92} {'loss': 0.7793, 'learning_rate': 3.5678078184260834e-07, 'epoch': 0.92} [2024-01-31 16:28:29,054] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7407, 'learning_rate': 3.5513314714303524e-07, 'epoch': 0.92} {'loss': 0.6997, 'learning_rate': 3.5348925691285675e-07, 'epoch': 0.92} {'loss': 0.7026, 'learning_rate': 3.518491117902878e-07, 'epoch': 0.92} [2024-01-31 16:29:25,775] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7896, 'learning_rate': 3.502127124120891e-07, 'epoch': 0.92} {'loss': 0.7759, 'learning_rate': 3.48580059413568e-07, 'epoch': 0.92} {'loss': 0.6992, 'learning_rate': 3.4695115342857524e-07, 'epoch': 0.92} [2024-01-31 16:30:20,607] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7637, 'learning_rate': 3.4532599508950826e-07, 'epoch': 0.92} [2024-01-31 16:30:37,582] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.729, 'learning_rate': 3.437045850273113e-07, 'epoch': 0.92} {'loss': 0.751, 'learning_rate': 3.420869238714708e-07, 'epoch': 0.92} {'loss': 0.7466, 'learning_rate': 3.404730122500155e-07, 'epoch': 0.92} {'loss': 0.7026, 'learning_rate': 3.3886285078952753e-07, 'epoch': 0.92} {'loss': 0.7495, 'learning_rate': 3.3725644011512125e-07, 'epoch': 0.92} [2024-01-31 16:32:10,165] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7031, 'learning_rate': 3.356537808504634e-07, 'epoch': 0.92} {'loss': 0.7793, 'learning_rate': 3.3405487361776177e-07, 'epoch': 0.92} {'loss': 0.729, 'learning_rate': 3.3245971903776654e-07, 'epoch': 0.92} {'loss': 0.7388, 'learning_rate': 3.308683177297711e-07, 'epoch': 0.92} {'loss': 0.6919, 'learning_rate': 3.292806703116125e-07, 'epoch': 0.92} {'loss': 0.6812, 'learning_rate': 3.2769677739966975e-07, 'epoch': 0.92} {'loss': 0.75, 'learning_rate': 3.2611663960886665e-07, 'epoch': 0.92} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739704745.jpg' {'loss': 0.7417, 'learning_rate': 3.245402575526646e-07, 'epoch': 0.92} {'loss': 0.7388, 'learning_rate': 3.2296763184306965e-07, 'epoch': 0.92} {'loss': 0.7158, 'learning_rate': 3.2139876309063233e-07, 'epoch': 0.92} {'loss': 0.6797, 'learning_rate': 3.198336519044376e-07, 'epoch': 0.92} {'loss': 0.7759, 'learning_rate': 3.182722988921161e-07, 'epoch': 0.92} {'loss': 0.728, 'learning_rate': 3.167147046598418e-07, 'epoch': 0.92} {'loss': 0.7329, 'learning_rate': 3.151608698123232e-07, 'epoch': 0.92} {'loss': 0.7534, 'learning_rate': 3.1361079495281443e-07, 'epoch': 0.92} {'loss': 0.7607, 'learning_rate': 3.1206448068310635e-07, 'epoch': 0.92} {'loss': 0.7515, 'learning_rate': 3.1052192760353316e-07, 'epoch': 0.92} {'loss': 0.7339, 'learning_rate': 3.0898313631296586e-07, 'epoch': 0.92} {'loss': 0.7563, 'learning_rate': 3.0744810740881646e-07, 'epoch': 0.92} {'loss': 0.7891, 'learning_rate': 3.0591684148703617e-07, 'epoch': 0.92} {'loss': 0.7686, 'learning_rate': 3.043893391421149e-07, 'epoch': 0.92} {'loss': 0.7881, 'learning_rate': 3.0286560096708275e-07, 'epoch': 0.92} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/531202852.jpg' {'loss': 0.6699, 'learning_rate': 3.013456275535054e-07, 'epoch': 0.92} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/870695916.jpg' {'loss': 0.7734, 'learning_rate': 2.998294194914897e-07, 'epoch': 0.92} {'loss': 0.7471, 'learning_rate': 2.983169773696815e-07, 'epoch': 0.92} {'loss': 0.7178, 'learning_rate': 2.968083017752599e-07, 'epoch': 0.92} {'loss': 0.7734, 'learning_rate': 2.953033932939464e-07, 'epoch': 0.92} {'loss': 0.7798, 'learning_rate': 2.938022525099982e-07, 'epoch': 0.92} {'loss': 0.2467, 'learning_rate': 2.9230488000621003e-07, 'epoch': 0.93} {'loss': 0.8027, 'learning_rate': 2.908112763639137e-07, 'epoch': 0.93} {'loss': 0.7368, 'learning_rate': 2.8932144216297643e-07, 'epoch': 0.93} {'loss': 0.7026, 'learning_rate': 2.878353779818044e-07, 'epoch': 0.93} {'loss': 0.7139, 'learning_rate': 2.863530843973372e-07, 'epoch': 0.93} {'loss': 0.7246, 'learning_rate': 2.848745619850546e-07, 'epoch': 0.93} {'loss': 0.7715, 'learning_rate': 2.833998113189662e-07, 'epoch': 0.93} {'loss': 0.7427, 'learning_rate': 2.8192883297162634e-07, 'epoch': 0.93} {'loss': 0.7412, 'learning_rate': 2.804616275141148e-07, 'epoch': 0.93} {'loss': 0.7285, 'learning_rate': 2.7899819551605256e-07, 'epoch': 0.93} {'loss': 0.7573, 'learning_rate': 2.7753853754559634e-07, 'epoch': 0.93} {'loss': 0.75, 'learning_rate': 2.760826541694328e-07, 'epoch': 0.93} {'loss': 0.769, 'learning_rate': 2.746305459527876e-07, 'epoch': 0.93} {'loss': 0.7432, 'learning_rate': 2.7318221345941865e-07, 'epoch': 0.93} {'loss': 0.7905, 'learning_rate': 2.717376572516184e-07, 'epoch': 0.93} {'loss': 0.255, 'learning_rate': 2.7029687789021377e-07, 'epoch': 0.93} {'loss': 0.7456, 'learning_rate': 2.688598759345651e-07, 'epoch': 0.93} {'loss': 0.7437, 'learning_rate': 2.67426651942565e-07, 'epoch': 0.93} {'loss': 0.7637, 'learning_rate': 2.659972064706406e-07, 'epoch': 0.93} {'loss': 0.7495, 'learning_rate': 2.645715400737536e-07, 'epoch': 0.93} {'loss': 0.7822, 'learning_rate': 2.631496533053934e-07, 'epoch': 0.93} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/472084798.jpg' {'loss': 0.7114, 'learning_rate': 2.6173154671758847e-07, 'epoch': 0.93} {'loss': 0.7222, 'learning_rate': 2.603172208608962e-07, 'epoch': 0.93} {'loss': 0.7476, 'learning_rate': 2.589066762844039e-07, 'epoch': 0.93} {'loss': 0.7778, 'learning_rate': 2.57499913535737e-07, 'epoch': 0.93} {'loss': 0.6877, 'learning_rate': 2.5609693316104745e-07, 'epoch': 0.93} {'loss': 0.7603, 'learning_rate': 2.5469773570502063e-07, 'epoch': 0.93} {'loss': 0.73, 'learning_rate': 2.5330232171087433e-07, 'epoch': 0.93} {'loss': 0.7529, 'learning_rate': 2.51910691720354e-07, 'epoch': 0.93} {'loss': 0.7676, 'learning_rate': 2.5052284627374077e-07, 'epoch': 0.93} {'loss': 0.7397, 'learning_rate': 2.491387859098426e-07, 'epoch': 0.93} {'loss': 0.7485, 'learning_rate': 2.477585111659997e-07, 'epoch': 0.93} {'loss': 0.7407, 'learning_rate': 2.463820225780811e-07, 'epoch': 0.93} {'loss': 0.7754, 'learning_rate': 2.4500932068049046e-07, 'epoch': 0.93} {'loss': 0.2606, 'learning_rate': 2.4364040600615477e-07, 'epoch': 0.93} {'loss': 0.7163, 'learning_rate': 2.422752790865346e-07, 'epoch': 0.93} {'loss': 0.7783, 'learning_rate': 2.409139404516203e-07, 'epoch': 0.93} {'loss': 0.7778, 'learning_rate': 2.3955639062992696e-07, 'epoch': 0.93} {'loss': 0.7109, 'learning_rate': 2.3820263014850741e-07, 'epoch': 0.93} {'loss': 0.7393, 'learning_rate': 2.3685265953293345e-07, 'epoch': 0.93} {'loss': 0.7734, 'learning_rate': 2.3550647930731362e-07, 'epoch': 0.93} {'loss': 0.7314, 'learning_rate': 2.3416408999427876e-07, 'epoch': 0.93} {'loss': 0.7485, 'learning_rate': 2.3282549211499307e-07, 'epoch': 0.93} {'loss': 0.7671, 'learning_rate': 2.3149068618914417e-07, 'epoch': 0.93} {'loss': 0.2767, 'learning_rate': 2.3015967273494867e-07, 'epoch': 0.93} {'loss': 0.8149, 'learning_rate': 2.2883245226915652e-07, 'epoch': 0.93} {'loss': 0.6748, 'learning_rate': 2.2750902530703667e-07, 'epoch': 0.93} {'loss': 0.7593, 'learning_rate': 2.2618939236238924e-07, 'epoch': 0.93} {'loss': 0.7822, 'learning_rate': 2.2487355394754328e-07, 'epoch': 0.93} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/849930987.jpg' {'loss': 0.7666, 'learning_rate': 2.2356151057334908e-07, 'epoch': 0.93} {'loss': 0.7769, 'learning_rate': 2.2225326274919135e-07, 'epoch': 0.93} {'loss': 0.77, 'learning_rate': 2.209488109829727e-07, 'epoch': 0.93} {'loss': 0.7036, 'learning_rate': 2.196481557811303e-07, 'epoch': 0.94} {'loss': 0.7148, 'learning_rate': 2.1835129764861907e-07, 'epoch': 0.94} {'loss': 0.7695, 'learning_rate': 2.1705823708892737e-07, 'epoch': 0.94} {'loss': 0.7417, 'learning_rate': 2.1576897460406477e-07, 'epoch': 0.94} {'loss': 0.7324, 'learning_rate': 2.144835106945664e-07, 'epoch': 0.94} {'loss': 0.7065, 'learning_rate': 2.1320184585949532e-07, 'epoch': 0.94} {'loss': 0.7793, 'learning_rate': 2.119239805964357e-07, 'epoch': 0.94} {'loss': 0.7271, 'learning_rate': 2.106499154015018e-07, 'epoch': 0.94} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/962726303.jpg' {'loss': 0.7881, 'learning_rate': 2.0937965076932576e-07, 'epoch': 0.94} [2024-01-31 16:59:15,219] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7749, 'learning_rate': 2.0811318719307194e-07, 'epoch': 0.94} {'loss': 0.769, 'learning_rate': 2.0685052516442373e-07, 'epoch': 0.94} {'loss': 0.7407, 'learning_rate': 2.0559166517358787e-07, 'epoch': 0.94} {'loss': 0.7397, 'learning_rate': 2.0433660770930009e-07, 'epoch': 0.94} {'loss': 0.7534, 'learning_rate': 2.0308535325881616e-07, 'epoch': 0.94} {'loss': 0.7393, 'learning_rate': 2.0183790230791532e-07, 'epoch': 0.94} {'loss': 0.8169, 'learning_rate': 2.0059425534090128e-07, 'epoch': 0.94} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B00XIZWWNC.jpg' {'loss': 0.7891, 'learning_rate': 1.9935441284059998e-07, 'epoch': 0.94} {'loss': 0.7993, 'learning_rate': 1.981183752883631e-07, 'epoch': 0.94} {'loss': 0.7554, 'learning_rate': 1.9688614316406006e-07, 'epoch': 0.94} {'loss': 0.6904, 'learning_rate': 1.9565771694608937e-07, 'epoch': 0.94} {'loss': 0.2847, 'learning_rate': 1.9443309711136393e-07, 'epoch': 0.94} {'loss': 0.7852, 'learning_rate': 1.9321228413532788e-07, 'epoch': 0.94} {'loss': 0.7129, 'learning_rate': 1.9199527849194098e-07, 'epoch': 0.94} {'loss': 0.769, 'learning_rate': 1.907820806536842e-07, 'epoch': 0.94} {'loss': 0.7861, 'learning_rate': 1.895726910915663e-07, 'epoch': 0.94} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1855326779.jpg' {'loss': 0.6978, 'learning_rate': 1.883671102751128e-07, 'epoch': 0.94} {'loss': 0.2455, 'learning_rate': 1.8716533867237153e-07, 'epoch': 0.94} {'loss': 0.7205, 'learning_rate': 1.859673767499115e-07, 'epoch': 0.94} {'loss': 0.6812, 'learning_rate': 1.847732249728218e-07, 'epoch': 0.94} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/967695104.jpg' {'loss': 0.7939, 'learning_rate': 1.83582883804716e-07, 'epoch': 0.94} {'loss': 0.2657, 'learning_rate': 1.8239635370772223e-07, 'epoch': 0.94} {'loss': 0.2582, 'learning_rate': 1.8121363514249534e-07, 'epoch': 0.94} {'loss': 0.7471, 'learning_rate': 1.8003472856820469e-07, 'epoch': 0.94} {'loss': 0.7417, 'learning_rate': 1.7885963444254528e-07, 'epoch': 0.94} {'loss': 0.7397, 'learning_rate': 1.7768835322172552e-07, 'epoch': 0.94} {'loss': 0.7764, 'learning_rate': 1.7652088536048052e-07, 'epoch': 0.94} {'loss': 0.2278, 'learning_rate': 1.7535723131206106e-07, 'epoch': 0.94} {'loss': 0.7104, 'learning_rate': 1.7419739152823468e-07, 'epoch': 0.94} {'loss': 0.7175, 'learning_rate': 1.7304136645929448e-07, 'epoch': 0.94} {'loss': 0.7627, 'learning_rate': 1.7188915655404814e-07, 'epoch': 0.94} {'loss': 0.7778, 'learning_rate': 1.707407622598223e-07, 'epoch': 0.94} {'loss': 0.7178, 'learning_rate': 1.695961840224636e-07, 'epoch': 0.94} {'loss': 0.7847, 'learning_rate': 1.6845542228633772e-07, 'epoch': 0.94} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/29344506.jpg' {'loss': 0.7383, 'learning_rate': 1.6731847749432705e-07, 'epoch': 0.94} {'loss': 0.75, 'learning_rate': 1.6618535008783075e-07, 'epoch': 0.94} {'loss': 0.7559, 'learning_rate': 1.6505604050677249e-07, 'epoch': 0.94} {'loss': 0.8184, 'learning_rate': 1.6393054918958373e-07, 'epoch': 0.94} {'loss': 0.7251, 'learning_rate': 1.6280887657322276e-07, 'epoch': 0.94} {'loss': 0.7676, 'learning_rate': 1.616910230931612e-07, 'epoch': 0.94} {'loss': 0.752, 'learning_rate': 1.6057698918338526e-07, 'epoch': 0.94} {'loss': 0.7808, 'learning_rate': 1.5946677527640563e-07, 'epoch': 0.94} {'loss': 0.7603, 'learning_rate': 1.5836038180324198e-07, 'epoch': 0.94} {'loss': 0.7202, 'learning_rate': 1.5725780919343624e-07, 'epoch': 0.95} {'loss': 0.752, 'learning_rate': 1.561590578750438e-07, 'epoch': 0.95} {'loss': 0.7109, 'learning_rate': 1.55064128274639e-07, 'epoch': 0.95} {'loss': 0.8086, 'learning_rate': 1.5397302081731069e-07, 'epoch': 0.95} {'loss': 0.6821, 'learning_rate': 1.5288573592666445e-07, 'epoch': 0.95} {'loss': 0.7246, 'learning_rate': 1.518022740248215e-07, 'epoch': 0.95} {'loss': 0.7334, 'learning_rate': 1.5072263553241872e-07, 'epoch': 0.95} {'loss': 0.7554, 'learning_rate': 1.4964682086861082e-07, 'epoch': 0.95} {'loss': 0.7344, 'learning_rate': 1.4857483045106258e-07, 'epoch': 0.95} {'loss': 0.7607, 'learning_rate': 1.475066646959611e-07, 'epoch': 0.95} {'loss': 0.7988, 'learning_rate': 1.4644232401800352e-07, 'epoch': 0.95} {'loss': 0.7446, 'learning_rate': 1.4538180883040264e-07, 'epoch': 0.95} {'loss': 0.7695, 'learning_rate': 1.4432511954488915e-07, 'epoch': 0.95} {'loss': 0.7393, 'learning_rate': 1.4327225657170485e-07, 'epoch': 0.95} {'loss': 0.7627, 'learning_rate': 1.4222322031960723e-07, 'epoch': 0.95} {'loss': 0.7607, 'learning_rate': 1.411780111958694e-07, 'epoch': 0.95} {'loss': 0.2346, 'learning_rate': 1.4013662960627562e-07, 'epoch': 0.95} {'loss': 0.7759, 'learning_rate': 1.3909907595512806e-07, 'epoch': 0.95} {'loss': 0.7583, 'learning_rate': 1.3806535064524006e-07, 'epoch': 0.95} {'loss': 0.7598, 'learning_rate': 1.3703545407793951e-07, 'epoch': 0.95} {'loss': 0.7524, 'learning_rate': 1.360093866530665e-07, 'epoch': 0.95} {'loss': 0.7617, 'learning_rate': 1.34987148768978e-07, 'epoch': 0.95} {'loss': 0.752, 'learning_rate': 1.3396874082253986e-07, 'epoch': 0.95} {'loss': 0.7593, 'learning_rate': 1.3295416320913357e-07, 'epoch': 0.95} {'loss': 0.7388, 'learning_rate': 1.3194341632265518e-07, 'epoch': 0.95} {'loss': 0.7217, 'learning_rate': 1.3093650055550855e-07, 'epoch': 0.95} {'loss': 0.707, 'learning_rate': 1.2993341629861432e-07, 'epoch': 0.95} {'loss': 0.7725, 'learning_rate': 1.2893416394140323e-07, 'epoch': 0.95} {'loss': 0.7617, 'learning_rate': 1.279387438718216e-07, 'epoch': 0.95} {'loss': 0.7749, 'learning_rate': 1.269471564763247e-07, 'epoch': 0.95} {'loss': 0.73, 'learning_rate': 1.2595940213988024e-07, 'epoch': 0.95} {'loss': 0.752, 'learning_rate': 1.2497548124597026e-07, 'epoch': 0.95} {'loss': 0.8091, 'learning_rate': 1.2399539417658368e-07, 'epoch': 0.95} {'loss': 0.7354, 'learning_rate': 1.2301914131222726e-07, 'epoch': 0.95} {'loss': 0.7471, 'learning_rate': 1.2204672303191335e-07, 'epoch': 0.95} {'loss': 0.7744, 'learning_rate': 1.2107813971317106e-07, 'epoch': 0.95} {'loss': 0.7749, 'learning_rate': 1.201133917320363e-07, 'epoch': 0.95} {'loss': 0.7529, 'learning_rate': 1.1915247946305498e-07, 'epoch': 0.95} {'loss': 0.7065, 'learning_rate': 1.1819540327929092e-07, 'epoch': 0.95} {'loss': 0.7739, 'learning_rate': 1.1724216355231022e-07, 'epoch': 0.95} {'loss': 0.7261, 'learning_rate': 1.1629276065219575e-07, 'epoch': 0.95} {'loss': 0.749, 'learning_rate': 1.1534719494753821e-07, 'epoch': 0.95} {'loss': 0.749, 'learning_rate': 1.144054668054373e-07, 'epoch': 0.95} {'loss': 0.8047, 'learning_rate': 1.1346757659150498e-07, 'epoch': 0.95} {'loss': 0.7271, 'learning_rate': 1.1253352466986334e-07, 'epoch': 0.95} {'loss': 0.7749, 'learning_rate': 1.116033114031434e-07, 'epoch': 0.95} {'loss': 0.7192, 'learning_rate': 1.1067693715248406e-07, 'epoch': 0.95} {'loss': 0.77, 'learning_rate': 1.0975440227753764e-07, 'epoch': 0.95} {'loss': 0.7583, 'learning_rate': 1.0883570713646318e-07, 'epoch': 0.95} {'loss': 0.6763, 'learning_rate': 1.0792085208593095e-07, 'epoch': 0.95} {'loss': 0.7432, 'learning_rate': 1.0700983748111792e-07, 'epoch': 0.95} {'loss': 0.7234, 'learning_rate': 1.061026636757101e-07, 'epoch': 0.95} {'loss': 0.7319, 'learning_rate': 1.0519933102190682e-07, 'epoch': 0.96} {'loss': 0.6865, 'learning_rate': 1.0429983987041092e-07, 'epoch': 0.96} {'loss': 0.7461, 'learning_rate': 1.0340419057043527e-07, 'epoch': 0.96} {'loss': 0.7415, 'learning_rate': 1.0251238346970393e-07, 'epoch': 0.96} {'loss': 0.7485, 'learning_rate': 1.0162441891444441e-07, 'epoch': 0.96} {'loss': 0.7175, 'learning_rate': 1.007402972493976e-07, 'epoch': 0.96} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/843129697.jpg' {'loss': 0.7803, 'learning_rate': 9.986001881780783e-08, 'epoch': 0.96} {'loss': 0.7251, 'learning_rate': 9.898358396143171e-08, 'epoch': 0.96} {'loss': 0.7388, 'learning_rate': 9.811099302052928e-08, 'epoch': 0.96} {'loss': 0.7666, 'learning_rate': 9.72422463338718e-08, 'epoch': 0.96} {'loss': 0.2483, 'learning_rate': 9.637734423873612e-08, 'epoch': 0.96} {'loss': 0.7588, 'learning_rate': 9.55162870709081e-08, 'epoch': 0.96} {'loss': 0.7837, 'learning_rate': 9.465907516467698e-08, 'epoch': 0.96} {'loss': 0.7417, 'learning_rate': 9.380570885284546e-08, 'epoch': 0.96} {'loss': 0.7549, 'learning_rate': 9.295618846671739e-08, 'epoch': 0.96} {'loss': 0.769, 'learning_rate': 9.211051433610674e-08, 'epoch': 0.96} [2024-01-31 17:33:19,682] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7905, 'learning_rate': 9.126868678933198e-08, 'epoch': 0.96} [2024-01-31 17:33:36,310] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7778, 'learning_rate': 9.04307061532217e-08, 'epoch': 0.96} {'loss': 0.7183, 'learning_rate': 8.959657275310674e-08, 'epoch': 0.96} {'loss': 0.748, 'learning_rate': 8.876628691282918e-08, 'epoch': 0.96} {'loss': 0.7432, 'learning_rate': 8.793984895473117e-08, 'epoch': 0.96} {'loss': 0.7983, 'learning_rate': 8.711725919966718e-08, 'epoch': 0.96} {'loss': 0.2351, 'learning_rate': 8.629851796699284e-08, 'epoch': 0.96} {'loss': 0.7949, 'learning_rate': 8.54836255745728e-08, 'epoch': 0.96} {'loss': 0.6694, 'learning_rate': 8.467258233877728e-08, 'epoch': 0.96} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/674035275.jpg' {'loss': 0.7412, 'learning_rate': 8.386538857447779e-08, 'epoch': 0.96} {'loss': 0.7671, 'learning_rate': 8.306204459505807e-08, 'epoch': 0.96} {'loss': 0.7427, 'learning_rate': 8.226255071240308e-08, 'epoch': 0.96} {'loss': 0.7485, 'learning_rate': 8.146690723690342e-08, 'epoch': 0.96} {'loss': 0.77, 'learning_rate': 8.067511447745535e-08, 'epoch': 0.96} {'loss': 0.7104, 'learning_rate': 7.988717274146074e-08, 'epoch': 0.96} {'loss': 0.269, 'learning_rate': 7.910308233482488e-08, 'epoch': 0.96} {'loss': 0.7563, 'learning_rate': 7.832284356195764e-08, 'epoch': 0.96} {'loss': 0.7505, 'learning_rate': 7.754645672577776e-08, 'epoch': 0.96} {'loss': 0.7715, 'learning_rate': 7.677392212770196e-08, 'epoch': 0.96} {'loss': 0.752, 'learning_rate': 7.600524006765808e-08, 'epoch': 0.96} {'loss': 0.7256, 'learning_rate': 7.524041084407185e-08, 'epoch': 0.96} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/939302322.jpg' {'loss': 0.7632, 'learning_rate': 7.447943475387797e-08, 'epoch': 0.96} {'loss': 0.7197, 'learning_rate': 7.372231209251346e-08, 'epoch': 0.96} {'loss': 0.7046, 'learning_rate': 7.296904315391873e-08, 'epoch': 0.96} [2024-01-31 17:43:48,128] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7593, 'learning_rate': 7.221962823053874e-08, 'epoch': 0.96} [2024-01-31 17:44:06,365] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7529, 'learning_rate': 7.147406761332298e-08, 'epoch': 0.96} {'loss': 0.752, 'learning_rate': 7.073236159172325e-08, 'epoch': 0.96} [2024-01-31 17:44:45,258] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7358, 'learning_rate': 6.999451045369587e-08, 'epoch': 0.96} {'loss': 0.7925, 'learning_rate': 6.926051448569948e-08, 'epoch': 0.96} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1564583031.jpg' [2024-01-31 17:45:20,525] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7197, 'learning_rate': 6.853037397269724e-08, 'epoch': 0.96} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1882419065.jpg' {'loss': 0.8677, 'learning_rate': 6.78040891981524e-08, 'epoch': 0.96} {'loss': 0.7642, 'learning_rate': 6.70816604440383e-08, 'epoch': 0.96} {'loss': 0.7397, 'learning_rate': 6.63630879908217e-08, 'epoch': 0.96} {'loss': 0.7593, 'learning_rate': 6.564837211748054e-08, 'epoch': 0.96} {'loss': 0.2463, 'learning_rate': 6.493751310149177e-08, 'epoch': 0.96} {'loss': 0.7632, 'learning_rate': 6.42305112188335e-08, 'epoch': 0.96} {'loss': 0.7422, 'learning_rate': 6.352736674398951e-08, 'epoch': 0.97} {'loss': 0.8262, 'learning_rate': 6.282807994994477e-08, 'epoch': 0.97} {'loss': 0.7314, 'learning_rate': 6.213265110818656e-08, 'epoch': 0.97} {'loss': 0.73, 'learning_rate': 6.144108048870335e-08, 'epoch': 0.97} {'loss': 0.7256, 'learning_rate': 6.075336835998813e-08, 'epoch': 0.97} [2024-01-31 17:49:00,478] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7554, 'learning_rate': 6.00695149890329e-08, 'epoch': 0.97} {'loss': 0.748, 'learning_rate': 5.938952064133419e-08, 'epoch': 0.97} {'loss': 0.7266, 'learning_rate': 5.871338558088857e-08, 'epoch': 0.97} {'loss': 0.7275, 'learning_rate': 5.8041110070194976e-08, 'epoch': 0.97} {'loss': 0.7964, 'learning_rate': 5.7372694370254614e-08, 'epoch': 0.97} {'loss': 0.7627, 'learning_rate': 5.67081387405688e-08, 'epoch': 0.97} {'loss': 0.7583, 'learning_rate': 5.6047443439141146e-08, 'epoch': 0.97} {'loss': 0.7144, 'learning_rate': 5.539060872247537e-08, 'epoch': 0.97} {'loss': 0.769, 'learning_rate': 5.47376348455797e-08, 'epoch': 0.97} {'loss': 0.7046, 'learning_rate': 5.408852206195914e-08, 'epoch': 0.97} {'loss': 0.7502, 'learning_rate': 5.344327062362098e-08, 'epoch': 0.97} {'loss': 0.2401, 'learning_rate': 5.2801880781075954e-08, 'epoch': 0.97} {'loss': 0.7549, 'learning_rate': 5.216435278333376e-08, 'epoch': 0.97} {'loss': 0.7271, 'learning_rate': 5.153068687790197e-08, 'epoch': 0.97} {'loss': 0.7437, 'learning_rate': 5.0900883310794903e-08, 'epoch': 0.97} {'loss': 0.7471, 'learning_rate': 5.0274942326521414e-08, 'epoch': 0.97} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/156347185X.jpg' {'loss': 0.7495, 'learning_rate': 4.9652864168096e-08, 'epoch': 0.97} {'loss': 0.71, 'learning_rate': 4.9034649077027706e-08, 'epoch': 0.97} {'loss': 0.7271, 'learning_rate': 4.84202972933312e-08, 'epoch': 0.97} [2024-01-31 17:55:00,602] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.749, 'learning_rate': 4.7809809055517906e-08, 'epoch': 0.97} {'loss': 0.7529, 'learning_rate': 4.720318460060047e-08, 'epoch': 0.97} [2024-01-31 17:55:38,086] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7476, 'learning_rate': 4.6600424164091606e-08, 'epoch': 0.97} {'loss': 0.7686, 'learning_rate': 4.6001527980004125e-08, 'epoch': 0.97} [2024-01-31 17:56:13,505] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7583, 'learning_rate': 4.54064962808487e-08, 'epoch': 0.97} {'loss': 0.7866, 'learning_rate': 4.4815329297639434e-08, 'epoch': 0.97} [2024-01-31 17:56:55,240] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7656, 'learning_rate': 4.422802725988606e-08, 'epoch': 0.97} [2024-01-31 17:57:13,649] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7681, 'learning_rate': 4.364459039559843e-08, 'epoch': 0.97} {'loss': 0.7344, 'learning_rate': 4.3065018931289784e-08, 'epoch': 0.97} [2024-01-31 17:57:52,459] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7329, 'learning_rate': 4.248931309196791e-08, 'epoch': 0.97} [2024-01-31 17:58:13,093] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7734, 'learning_rate': 4.1917473101140696e-08, 'epoch': 0.97} [2024-01-31 17:58:31,625] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7305, 'learning_rate': 4.134949918081832e-08, 'epoch': 0.97} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/080740604X.jpg' [2024-01-31 17:58:49,798] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7876, 'learning_rate': 4.0785391551506626e-08, 'epoch': 0.97} {'loss': 0.7588, 'learning_rate': 4.022515043221154e-08, 'epoch': 0.97} [2024-01-31 17:59:24,887] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7578, 'learning_rate': 3.966877604043795e-08, 'epoch': 0.97} [2024-01-31 17:59:44,111] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7246, 'learning_rate': 3.9116268592189755e-08, 'epoch': 0.97} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/64462013.jpg' {'loss': 0.7559, 'learning_rate': 3.8567628301969806e-08, 'epoch': 0.97} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/819183482.jpg' [2024-01-31 18:00:20,965] [WARNING] [stage3.py:1898:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.7817, 'learning_rate': 3.802285538277772e-08, 'epoch': 0.97} [2024-01-31 18:00:41,240] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.752, 'learning_rate': 3.748195004611543e-08, 'epoch': 0.97} {'loss': 0.7314, 'learning_rate': 3.69449125019794e-08, 'epoch': 0.97} {'loss': 0.6934, 'learning_rate': 3.6411742958866184e-08, 'epoch': 0.97} {'loss': 0.7466, 'learning_rate': 3.588244162377019e-08, 'epoch': 0.97} {'loss': 0.8013, 'learning_rate': 3.5357008702185945e-08, 'epoch': 0.97} {'loss': 0.7783, 'learning_rate': 3.483544439810249e-08, 'epoch': 0.97} {'loss': 0.7656, 'learning_rate': 3.4317748914011187e-08, 'epoch': 0.97} {'loss': 0.731, 'learning_rate': 3.3803922450897917e-08, 'epoch': 0.97} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/739704133.jpg' {'loss': 0.79, 'learning_rate': 3.329396520824757e-08, 'epoch': 0.97} {'loss': 0.7124, 'learning_rate': 3.2787877384045095e-08, 'epoch': 0.97} {'loss': 0.7617, 'learning_rate': 3.228565917476889e-08, 'epoch': 0.98} {'loss': 0.7959, 'learning_rate': 3.178731077539743e-08, 'epoch': 0.98} {'loss': 0.7661, 'learning_rate': 3.129283237940928e-08, 'epoch': 0.98} {'loss': 0.7407, 'learning_rate': 3.080222417877421e-08, 'epoch': 0.98} {'loss': 0.7764, 'learning_rate': 3.031548636396764e-08, 'epoch': 0.98} {'loss': 0.7671, 'learning_rate': 2.983261912395397e-08, 'epoch': 0.98} {'loss': 0.7295, 'learning_rate': 2.9353622646199898e-08, 'epoch': 0.98} {'loss': 0.7612, 'learning_rate': 2.8878497116671124e-08, 'epoch': 0.98} {'loss': 0.749, 'learning_rate': 2.8407242719823424e-08, 'epoch': 0.98} {'loss': 0.2679, 'learning_rate': 2.7939859638617118e-08, 'epoch': 0.98} {'loss': 0.7744, 'learning_rate': 2.7476348054504832e-08, 'epoch': 0.98} {'loss': 0.7637, 'learning_rate': 2.7016708147439285e-08, 'epoch': 0.98} {'loss': 0.7368, 'learning_rate': 2.6560940095866626e-08, 'epoch': 0.98} {'loss': 0.7632, 'learning_rate': 2.6109044076733092e-08, 'epoch': 0.98} {'loss': 0.7305, 'learning_rate': 2.5661020265479452e-08, 'epoch': 0.98} {'loss': 0.79, 'learning_rate': 2.5216868836043242e-08, 'epoch': 0.98} {'loss': 0.7935, 'learning_rate': 2.4776589960862074e-08, 'epoch': 0.98} {'loss': 0.8037, 'learning_rate': 2.434018381086589e-08, 'epoch': 0.98} {'loss': 0.7124, 'learning_rate': 2.3907650555481387e-08, 'epoch': 0.98} {'loss': 0.7188, 'learning_rate': 2.3478990362634235e-08, 'epoch': 0.98} {'loss': 0.7007, 'learning_rate': 2.3054203398743537e-08, 'epoch': 0.98} {'loss': 0.7725, 'learning_rate': 2.263328982872959e-08, 'epoch': 0.98} {'loss': 0.75, 'learning_rate': 2.221624981600168e-08, 'epoch': 0.98} {'loss': 0.7271, 'learning_rate': 2.1803083522471402e-08, 'epoch': 0.98} {'loss': 0.7007, 'learning_rate': 2.1393791108542672e-08, 'epoch': 0.98} {'loss': 0.7266, 'learning_rate': 2.098837273311838e-08, 'epoch': 0.98} {'loss': 0.7725, 'learning_rate': 2.058682855359595e-08, 'epoch': 0.98} {'loss': 0.7246, 'learning_rate': 2.0189158725867353e-08, 'epoch': 0.98} {'loss': 0.7129, 'learning_rate': 1.979536340432131e-08, 'epoch': 0.98} {'loss': 0.811, 'learning_rate': 1.9405442741844415e-08, 'epoch': 0.98} {'loss': 0.7876, 'learning_rate': 1.9019396889816688e-08, 'epoch': 0.98} {'loss': 0.7515, 'learning_rate': 1.8637225998114904e-08, 'epoch': 0.98} {'loss': 0.7642, 'learning_rate': 1.825893021510927e-08, 'epoch': 0.98} {'loss': 0.7441, 'learning_rate': 1.7884509687668972e-08, 'epoch': 0.98} {'loss': 0.7744, 'learning_rate': 1.7513964561156617e-08, 'epoch': 0.98} {'loss': 0.7817, 'learning_rate': 1.714729497942935e-08, 'epoch': 0.98} {'loss': 0.8364, 'learning_rate': 1.6784501084843307e-08, 'epoch': 0.98} {'loss': 0.7446, 'learning_rate': 1.6425583018244706e-08, 'epoch': 0.98} {'loss': 0.7266, 'learning_rate': 1.607054091897986e-08, 'epoch': 0.98} {'loss': 0.7988, 'learning_rate': 1.57193749248874e-08, 'epoch': 0.98} {'loss': 0.7012, 'learning_rate': 1.537208517230271e-08, 'epoch': 0.98} {'loss': 0.77, 'learning_rate': 1.5028671796055715e-08, 'epoch': 0.98} {'loss': 0.7026, 'learning_rate': 1.4689134929470884e-08, 'epoch': 0.98} {'loss': 0.8267, 'learning_rate': 1.435347470436832e-08, 'epoch': 0.98} {'loss': 0.7661, 'learning_rate': 1.4021691251062675e-08, 'epoch': 0.98} {'loss': 0.6987, 'learning_rate': 1.3693784698363133e-08, 'epoch': 0.98} {'loss': 0.7437, 'learning_rate': 1.3369755173575639e-08, 'epoch': 0.98} {'loss': 0.2869, 'learning_rate': 1.3049602802498451e-08, 'epoch': 0.98} {'loss': 0.7554, 'learning_rate': 1.273332770942659e-08, 'epoch': 0.98} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/60191341.jpg' {'loss': 0.7314, 'learning_rate': 1.2420930017148503e-08, 'epoch': 0.98} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1564582922.jpg' {'loss': 0.7261, 'learning_rate': 1.2112409846947171e-08, 'epoch': 0.98} {'loss': 0.7451, 'learning_rate': 1.1807767318602337e-08, 'epoch': 0.98} {'loss': 0.8066, 'learning_rate': 1.150700255038606e-08, 'epoch': 0.99} {'loss': 0.7031, 'learning_rate': 1.1210115659063825e-08, 'epoch': 0.99} {'loss': 0.7305, 'learning_rate': 1.0917106759900097e-08, 'epoch': 0.99} {'loss': 0.7236, 'learning_rate': 1.0627975966649439e-08, 'epoch': 0.99} {'loss': 0.6929, 'learning_rate': 1.034272339156206e-08, 'epoch': 0.99} {'loss': 0.7676, 'learning_rate': 1.0061349145383814e-08, 'epoch': 0.99} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/093727464X.jpg' {'loss': 0.791, 'learning_rate': 9.783853337353987e-09, 'epoch': 0.99} {'loss': 0.7656, 'learning_rate': 9.510236075205292e-09, 'epoch': 0.99} {'loss': 0.7935, 'learning_rate': 9.240497465164978e-09, 'epoch': 0.99} {'loss': 0.7583, 'learning_rate': 8.974637611955939e-09, 'epoch': 0.99} {'loss': 0.7144, 'learning_rate': 8.712656618793391e-09, 'epoch': 0.99} {'loss': 0.7739, 'learning_rate': 8.454554587388198e-09, 'epoch': 0.99} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/037550267X.jpg' [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/1564583015.jpg' {'loss': 0.7568, 'learning_rate': 8.200331617943535e-09, 'epoch': 0.99} {'loss': 0.7837, 'learning_rate': 7.949987809158232e-09, 'epoch': 0.99} {'loss': 0.7378, 'learning_rate': 7.703523258223433e-09, 'epoch': 0.99} {'loss': 0.7983, 'learning_rate': 7.460938060825929e-09, 'epoch': 0.99} {'loss': 0.8057, 'learning_rate': 7.222232311145938e-09, 'epoch': 0.99} {'loss': 0.729, 'learning_rate': 6.987406101855998e-09, 'epoch': 0.99} {'loss': 0.7524, 'learning_rate': 6.756459524125403e-09, 'epoch': 0.99} {'loss': 0.8486, 'learning_rate': 6.5293926676135434e-09, 'epoch': 0.99} {'loss': 0.7207, 'learning_rate': 6.306205620477679e-09, 'epoch': 0.99} {'loss': 0.7476, 'learning_rate': 6.086898469365166e-09, 'epoch': 0.99} {'loss': 0.7827, 'learning_rate': 5.871471299419007e-09, 'epoch': 0.99} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/375501983.jpg' {'loss': 0.7612, 'learning_rate': 5.6599241942767445e-09, 'epoch': 0.99} {'loss': 0.769, 'learning_rate': 5.452257236066017e-09, 'epoch': 0.99} {'loss': 0.7666, 'learning_rate': 5.248470505412328e-09, 'epoch': 0.99} {'loss': 0.6982, 'learning_rate': 5.0485640814312844e-09, 'epoch': 0.99} {'loss': 0.7578, 'learning_rate': 4.8525380417330234e-09, 'epoch': 0.99} {'loss': 0.7192, 'learning_rate': 4.660392462424446e-09, 'epoch': 0.99} {'loss': 0.7559, 'learning_rate': 4.472127418099215e-09, 'epoch': 0.99} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/087033512X.jpg' {'loss': 0.7764, 'learning_rate': 4.287742981851084e-09, 'epoch': 0.99} {'loss': 0.6768, 'learning_rate': 4.1072392252639034e-09, 'epoch': 0.99} {'loss': 0.6841, 'learning_rate': 3.930616218414951e-09, 'epoch': 0.99} [Errno 2] No such file or directory: './playground/data/ocr_vqa/images/B00XLZW19O.jpg' {'loss': 0.7739, 'learning_rate': 3.757874029874931e-09, 'epoch': 0.99} {'loss': 0.7642, 'learning_rate': 3.5890127267090844e-09, 'epoch': 0.99} {'loss': 0.748, 'learning_rate': 3.424032374476083e-09, 'epoch': 0.99} {'loss': 0.7783, 'learning_rate': 3.2629330372246915e-09, 'epoch': 0.99} {'loss': 0.6978, 'learning_rate': 3.105714777501545e-09, 'epoch': 0.99} {'loss': 0.7188, 'learning_rate': 2.9523776563422644e-09, 'epoch': 0.99} {'loss': 0.7554, 'learning_rate': 2.802921733278119e-09, 'epoch': 0.99} {'loss': 0.7393, 'learning_rate': 2.657347066333804e-09, 'epoch': 0.99} {'loss': 0.6638, 'learning_rate': 2.5156537120263335e-09, 'epoch': 0.99} {'loss': 0.7622, 'learning_rate': 2.3778417253650376e-09, 'epoch': 0.99} {'loss': 0.7861, 'learning_rate': 2.2439111598537844e-09, 'epoch': 0.99} {'loss': 0.2557, 'learning_rate': 2.113862067488759e-09, 'epoch': 0.99} {'loss': 0.7358, 'learning_rate': 1.987694498760684e-09, 'epoch': 0.99} {'loss': 0.8052, 'learning_rate': 1.865408502650379e-09, 'epoch': 0.99} {'loss': 0.7749, 'learning_rate': 1.747004126635421e-09, 'epoch': 0.99} {'loss': 0.7158, 'learning_rate': 1.6324814166823744e-09, 'epoch': 0.99} {'loss': 0.7168, 'learning_rate': 1.5218404172545609e-09, 'epoch': 0.99} {'loss': 0.7285, 'learning_rate': 1.415081171305399e-09, 'epoch': 0.99} {'loss': 0.7188, 'learning_rate': 1.3122037202828452e-09, 'epoch': 0.99} {'loss': 0.7271, 'learning_rate': 1.2132081041282829e-09, 'epoch': 1.0} {'loss': 0.772, 'learning_rate': 1.1180943612754124e-09, 'epoch': 1.0} {'loss': 0.7197, 'learning_rate': 1.026862528649142e-09, 'epoch': 1.0} WARNING: tokenization mismatch: 1 vs. 64. (ignored) {'loss': 0.7314, 'learning_rate': 9.39512641668916e-10, 'epoch': 1.0} {'loss': 0.7065, 'learning_rate': 8.560447342487177e-10, 'epoch': 1.0} {'loss': 0.7434, 'learning_rate': 7.764588387915161e-10, 'epoch': 1.0} {'loss': 0.7505, 'learning_rate': 7.007549861970387e-10, 'epoch': 1.0} {'loss': 0.7642, 'learning_rate': 6.289332058551089e-10, 'epoch': 1.0} {'loss': 0.7061, 'learning_rate': 5.609935256500887e-10, 'epoch': 1.0} {'loss': 0.7686, 'learning_rate': 4.969359719586563e-10, 'epoch': 1.0} {'loss': 0.7837, 'learning_rate': 4.3676056964869764e-10, 'epoch': 1.0} {'loss': 0.8057, 'learning_rate': 3.804673420837457e-10, 'epoch': 1.0} {'loss': 0.7588, 'learning_rate': 3.2805631111743064e-10, 'epoch': 1.0} {'loss': 0.7363, 'learning_rate': 2.795274971001405e-10, 'epoch': 1.0} {'loss': 0.8062, 'learning_rate': 2.3488091886902933e-10, 'epoch': 1.0} {'loss': 0.7549, 'learning_rate': 1.941165937602296e-10, 'epoch': 1.0} {'loss': 0.7485, 'learning_rate': 1.5723453759886042e-10, 'epoch': 1.0} {'loss': 0.7358, 'learning_rate': 1.2423476470346808e-10, 'epoch': 1.0} {'loss': 0.7896, 'learning_rate': 9.511728788602625e-11, 'epoch': 1.0} {'loss': 0.2249, 'learning_rate': 6.988211845082582e-11, 'epoch': 1.0} {'loss': 0.7803, 'learning_rate': 4.852926619447473e-11, 'epoch': 1.0} {'loss': 0.8276, 'learning_rate': 3.105873940811854e-11, 'epoch': 1.0} {'loss': 0.7725, 'learning_rate': 1.7470544874109706e-11, 'epoch': 1.0} {'loss': 0.7368, 'learning_rate': 7.764687866007592e-12, 'epoch': 1.0} {'loss': 0.7876, 'learning_rate': 1.9411721552398123e-12, 'epoch': 1.0} {'loss': 0.3992, 'learning_rate': 0.0, 'epoch': 1.0} {'train_runtime': 97380.0165, 'train_samples_per_second': 6.832, 'train_steps_per_second': 0.053, 'train_loss': 0.7711350256408714, 'epoch': 1.0}