[2023-12-25 02:57:56,716] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:56,716] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:56,716] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:56,716] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:56,716] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:56,716] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:56,814] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:56,880] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None [2023-12-25 02:57:57,177] [INFO] [comm.py:643:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None [2023-12-25 02:57:57,177] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None [2023-12-25 02:57:57,177] [INFO] [comm.py:616:init_distributed] cdb=None 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 4 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 6 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 7 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 2 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 3 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 1 12/25/2021212/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 24: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 31: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 25: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 28: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 26: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 30: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 29: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - INFO - torch.distributed.distributed_c10d - Rank 27: Completed store-based barrier for key:store_based_barrier_key:1 with 32 nodes. 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 7, device: cuda:7, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 4, device: cuda:4, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 2, device: cuda:2, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 3, device: cuda:3, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 6, device: cuda:6, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - WARNING - utils.common - We recommend enable fp16 mixed precision training. 12/25/2023 02:57:57 - WARNING - utils.common - Please specify `prompt_template` if you are using other pre-trained models. 12/25/2023 02:57:57 - WARNING - utils.common - `ddp_find_unused_parameters` needs to be set as False in DDP training. 12/25/2023 02:57:57 - INFO - utils.common - Process rank: 5, device: cuda:5, n_gpu: 1 distributed training: True, 16-bits training: False 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/25/2023 02:57:57 - INFO - utils.common - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=ds_z3_no_offload.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/group/20025/jiuding/ckpt/34B_final/runs/Dec25_02-57-56_vc-816648091-20231223-a1d29e4b-worker-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=adamw_torch, optim_args=None, output_dir=/group/20025/jiuding/ckpt/34B_final, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/group/20025/jiuding/ckpt/34B_final, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=epoch, save_total_limit=10, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.012/212/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and inference. 12/12/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and inference. 12/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and inference. 12/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and inference. 12/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and inference. 12/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and inference. 12/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and inference. 12/25/2023 02:57:57 - INFO - utils.common - Using FlashAttention-2 for faster training and infevc-816648091-20231223-a1d29e4b-worker-1:5785:5785 [0] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5785:5785 [0] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5785:5785 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5789:5789 [4] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5789:5789 [4] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5789:5789 [4] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5792:5792 [7] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5791:5791 [6] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5786:5786 [1] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5792:5792 [7] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5792:5792 [7] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5791:5791 [6] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5791:5791 [6] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5786:5786 [1] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5786:5786 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5788:5788 [3] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5788:5788 [3] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5788:5788 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5790:5790 [5] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5790:5790 [5] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5790:5790 [5] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5787:5787 [2] NCCL INFO cudaDriverVersion 11080 vc-816648091-20231223-a1d29e4b-worker-1:5787:5787 [2] NCCL INFO Bootstrap : Using eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5787:5787 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO NET/IB : Using [0]mlx5_2:1/RoCE [RO]; OOB eth1:11.220.53.97<0> vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Using network IB vc-816648091-2023122vc-816648091-20231223-a1d29e4b-worker-2:5780:6661 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,ffff0000,000vc-vc-816648091-20231223-a1d29e4b-worker-0:5800:6675 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,ffff0000,00000000,ffffffff,ffff0000,00000000 vc-816648091-20231223-a1d29e4b-worker-0:5795:6674 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,00000000,0000ffff,ffffffff vc-816648091-20231223-a1d29e4b-worker-0:5794:6672 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,00000000,0000ffff,ffffffff vc-816648091-20231223-a1d29e4b-worker-0:5801:6677 [7] NCCL INFO Setting affinity for GPU 7 to ffffffff,ffff0000,00000000,ffffffff,ffff0000,00000000 vc-816648091-20231223-a1d29e4b-worker-0:5799:6678 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,ffff0000,00000000,ffffffff,ffff0000,00000000 vc-816648091-20231223-a1d29e4b-worker-0:5798:6676 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,ffff0000,00000000,ffffffff,ffff0000,00000000 vc-816648091-20231223-a1d29e4b-worker-0:5797:6673 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,00000000,0000ffff,ffffffvc-816648091-2023122vc-816648091-20231223-a1d29e4b-worker-3:6071:6957 [2] NCCL INFO Trees [0] 27/-1/-1->26->18 vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Trees [0] 9/-1/-1->8->15 [1] 9/-1/-1->8->15 vc-816648091-20231223-a1dvc-816648091-20231223-a1d29e4b-worker-3:6069:6956 [0] NCCL INFO Trees [0] 25/-1/-1->24->31 [1] 25/-1/-1->24->31 vc-816648091-20231223-a1d29e4b-worker-3:6073:6953 [4] NCCL INFO Trees [0] 29/-1/-1->28->27 [1] 29/-1/-1->28->27 vc-816648091-20231223-a1d29e4b-worker-3:6076:6954 [7] NCCL INFO Trees [0] 24/-1/-1->31->30 [1] 24/-1/-1->31->30 vc-816648091-20231223-a1d29e4b-worker-3:6074:6959 [5] NCCL INFO Trees [0] 30/-1/-1->29->28 [1] 30/-1/-1->29->28 vc-816648091-20231223-a1d29e4b-worker-3:6070:6952 [1] NCCL INFO Trees [0] -1/-1/-1->25->24 [1] -1/-1/-1->25->24 vc-816648091-20231223-a1d29e4b-worker-3:6075:6958 [6] NCCL INFO Trees [0] 31/-1/-vc-816648091-20231223-a1d29e4bvvc-816648091-20231223-a1d29e4b-worker-3:6073:6953 [4] NCCL INFO Channel 00/0 : 28[93000] -> 25[13000] via P2P/IPC/readvvc-816648091-20231223-a1d29e4b-worker-3:6071:6957 [2] NCCL INFO Channel 00/0 : 26[4b000] -> 31[d0000] via P2P/IPC/readvc-816648091-20231223-a1d29e4b-worker-2:5774:6660 [0] NCCL INFO Channel 00/0 : 16[e000vc-816648091-20231223-a1d29e4b-wvc-816648091-20231223-a1d29e4b-worker-2:5778:6662 [4] NCCL INFO Channel 01/0 : 20[93000vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Channel 01/0 : 8[e000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Channel 01/0 : 12[93000] -> 9[13000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO Channel 00/0 : 14[cb000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO Channel 00/0 : 13[99000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO Channel 01/0 : 14[cb000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO Channel 01/0 : 13[99000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO Channel 00/0 : 15[d0000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO Channel 00/0 : 9[13000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO Channel 01/0 : 15[d0000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO Channel 01/0 : 9[13000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Channel 00/0 : 8[e000] -> 9[13000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Channel 01/0 : 8[e000] -> 9[13000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Channel 00/0 : 12[93000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Channel 01/0 : 12[93000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO Channel 00/0 : 14[cb000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Channel 00/0 : 8[e000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO Channel 00/0 : 13[99000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO Channel 01/0 : 14[cb000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Channel 01/0 : 8[e000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO Channel 01/0 : 13[99000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5790:6679 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO Channel 00/0 : 15[d0000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO Channel 01/0 : 15[d0000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5792:6671 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Channel 00/0 : 11[51000] -> 18[4b000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 00/0 : 3[51000] -> 10[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Channel 01/0 : 11[51000] -> 18[4b000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 01/0 : 3[51000] -> 10[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6688 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Channel 00/0 : 11[51000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Channel 01/0 : 11[51000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 00/0 : 10[4b000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 01/0 : 10[4b000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Channel 00/0 : 12[93000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Channel 01/0 : 12[93000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 01/0 : 2[4b000] -> 10[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Channel 01/0 : 18[4b000] -> 11[51000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 00/0 : 10[4b000] -> 19[51000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 01/0 : 26[4b000] -> 10[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 01/0 : 10[4b000] -> 26[4b000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 00/0 : 19[51000] -> 10[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Channel 01/0 : 10[4b000] -> 2[4b000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Channel 00/0 : 11[51000] -> 10[4b000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Channel 01/0 : 11[51000] -> 10[4b000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 vc-816648091-20231223-vc-816648091-20231223-a1d29e4b-worker-1:5788:6674 [3] NCCL INFO comm 0x68449270 rank 11 nranks 32 cudaDev 3 busId 51000 - Init COMPLETE vc-816648091-20231223-a1d29e4b-worker-1:5787:6681 [2] NCCL INFO comm 0x68d79a60 rank 10 nranks 32 cudaDev 2 busId 4b000 - Init COMPLETE vc-816648091-20231223-a1d29e4b-worker-1:5789:6668 [4] NCCL INFO comm 0x6ab9dec0 rank 12 nranks 32 cudaDev 4 busId 93000 - Init COMPLETE vc-816648091-20231223-a1d29e4b-worker-1:5785:6667 [0] NCCL INFO comm 0x67496240 rank 8 nranks 32 cudaDev 0 busId e000 - Init COMPLETE vc-816648091-20231223-a1d29e4b-worker-1:5786:6673 [1] NCCL INFO comm 0x6996c5b0 rank 9 nranks 32 cudaDev 1 busId 13000 - Init COMPLETE vc-816648091-20231223-a1d29e4b-worker-1:5791:6672 [6] NCCL INFO co12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full 12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full 12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full 12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full 12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full 12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full 12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full 12/25/2023 02:59:19 - INFO - utils.common - Fine-tuning method: Full trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 length of tokenizer: 32005 pad_token_id is: 32004 trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 length of tokenizer: 32005 pad_token_id is: 32004 trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 length of tokenizer: 32005 pad_token_id is: 32004 trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 length of tokenizer: 32005 pad_token_id is: 32004 trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 length of tokenizer: 32005 pad_token_id is: 32004 trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 length of tokenizer: 32005 pad_token_id is: trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 32004 length of tokenizer: 32005 pad_token_id is: 32004 trainable params: 33744052224 || all params: 33744052224 || trainable%: 100.0000 length of tokenizer: 32005 pad_token_id is: 32004 length of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385,length of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, length of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] length of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_idslength of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] labels```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_idslength of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_idslabels```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. length of input_ids : 1212 length of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] labels```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] length of input_ids : 1212 input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_idslabels```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. length of input_ids : 1212 input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] labels```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. length of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] length of input_ids : 1212 input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] length of input_ids : 1212 input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] inputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_idsinputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_idslabels```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. inputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_idslabelshere are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. length of input_ids : 1212 input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] inputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_idslabelshere are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. inputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_idslength of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] labelshere are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_idslabels: [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] There are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. labels: [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. labelshere are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. inputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_idslength of input_ids : 1212 input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] labelshere are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. inputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_idslabelshere are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. 12/25/2023 02:59:26 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 3 12/25/2023 02:59:26 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 2 12/25/2023 02:59:26 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 4 12/25/2023 02:59:26 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 5 12/25/2023 02:59:26 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 1 12/25/2023 02:59:26 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 7 12/25/2023 02:59:26 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 6 length of input_ids : 394 input_ids: [1, 518, 25580, 29962, 12148, 626, 355, 278, 15352, 5132, 2471, 577, 393, 372, 7805, 263, 525, 8000, 29915, 2425, 3265, 1135, 278, 5923, 525, 1454, 29915, 2425, 29892, 607, 4256, 1078, 1549, 278, 4452, 310, 385, 6043, 1051, 29889, 13, 13, 1576, 2471, 5279, 756, 263, 6494, 988, 372, 14734, 304, 1596, 385, 1203, 393, 338, 5377, 278, 13451, 310, 278, 1051, 29889, 24778, 445, 1059, 322, 6623, 278, 2471, 304, 671, 525, 8000, 29915, 2012, 310, 525, 1454, 29915, 2425, 29889, 22521, 545, 596, 2471, 5149, 17766, 4069, 8857, 29889, 29871, 13, 13, 28956, 4691, 13, 29871, 396, 2661, 370, 1674, 385, 6043, 1051, 13, 29871, 3948, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29871, 396, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29871, 302, 353, 7431, 29898, 2749, 29897, 13, 13, 29871, 396, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 29871, 363, 474, 297, 3464, 29898, 29876, 29974, 29896, 1125, 13, 418, 1596, 29898, 2749, 29961, 29875, 2314, 13, 28956, 29961, 29914, 25580, 29962, 7521, 4691, 13, 29937, 2661, 370, 1674, 385, 6043, 1051, 13, 2749, 353, 518, 29896, 29892, 29871, 29906, 29892, 29871, 29941, 29892, 29871, 29946, 29962, 13, 13, 29937, 5953, 837, 457, 278, 3309, 310, 278, 1051, 13, 29876, 353, 7431, 29898, 2749, 29897, 13, 13, 29937, 25455, 2380, 472, 29871, 29900, 13, 29875, 353, 29871, 29900, 13, 13, 29937, 3201, 3901, 278, 1051, 322, 1962, 1269, 5375, 1543, 13, 8000, 474, 529, 302, 29901, 13, 1678, 1596, 29898, 2749, 29961, 29875, 2314, 13, 1678, 474, 4619, 29871, 29896, 13, 28956, 13, 797, 278, 2183, 775, 29892, 372, 14335, 304, 2130, 421, 2749, 29961, 29876, 7961, 607, 338, 714, 310, 13451, 408, 3017, 1051, 26190, 8665, 515, 29871, 29900, 322, 10614, 472, 302, 29899, 29896, 363, 263, 1051, 310, 3309, 302, 29889, 13, 17351, 29892, 306, 1304, 421, 8000, 29952, 2425, 408, 639, 596, 2009, 29889, 2648, 2847, 5281, 421, 29875, 29922, 29900, 29952, 1434, 278, 2425, 322, 421, 29875, 4619, 29871, 29896, 29952, 2629, 278, 2425, 29892, 591, 508, 13649, 975, 278, 1051, 29889, 910, 12541, 18172, 2745, 421, 29875, 29952, 338, 3109, 1135, 421, 29876, 1673, 278, 3309, 310, 278, 1051, 29889, 450, 421, 8000, 29952, 2425, 674, 451, 8469, 565, 278, 1051, 338, 4069, 577, 372, 5149, 17766, 4069, 8857, 29889, 2] inputs: [INST]Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list. The script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Traverse the list and output each individual element for i in range(n+1): print(arr[i]) ```[/INST] ```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. label_idslabels```python # Establish an integer list arr = [1, 2, 3, 4] # Determine the length of the list n = len(arr) # Initialize index at 0 i = 0 # Traverse the list and output each individual element while i < n: print(arr[i]) i += 1 ``` In the given code, it tries to access `arr[n]` which is out of bounds as python list indexing starts from 0 and ends at n-1 for a list of length n. Also, I used `while` loop as per your request. By initializing `i=0` before the loop and `i += 1` within the loop, we can iterate over the list. This iteration continues until `i` is less than `n`, the length of the list. The `while` loop will not proceed if the list is empty so it correctly handles empty lists. length of input_ids : 1212 input_ids: [1, 518, 25580, 29962, 29875, 29915, 345, 2355, 445, 3017, 775, 515, 385, 288, 7283, 5780, 29892, 541, 372, 29915, 29879, 451, 1985, 29889, 508, 366, 4744, 372, 363, 592, 29973, 13, 1990, 29871, 29945, 324, 29894, 29955, 29991, 265, 29901, 13, 29877, 300, 260, 513, 21322, 30181, 303, 11457, 29876, 1566, 761, 29952, 302, 29876, 29987, 29879, 29955, 261, 29901, 525, 5756, 29924, 29987, 29879, 29974, 261, 1495, 448, 29966, 29871, 29896, 2273, 29901, 13, 1311, 29889, 29877, 29896, 29878, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1822, 29918, 29915, 29900, 2396, 518, 29896, 29892, 29871, 29900, 1118, 525, 29931, 2396, 518, 29900, 29892, 29918, 29899, 29896, 1822, 525, 29934, 2396, 426, 29900, 29892, 29871, 29896, 12258, 13, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 448, 11117, 29963, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 29920, 30181, 29989, 29888, 29889, 29887, 5378, 353, 313, 324, 781, 29875, 4835, 29889, 4381, 27774, 29955, 29898, 2892, 29901, 29918, 29888, 417, 29946, 29873, 877, 3083, 29888, 11287, 13, 29937, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 29879, 30181, 29989, 29888, 29889, 12637, 479, 29974, 448, 10050, 30181, 13, 29945, 761, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 286, 29987, 303, 29941, 29878, 29897, 13, 29896, 29888, 395, 29872, 277, 29892, 29873, 29946, 29878, 657, 29918, 275, 390, 650, 29901, 13, 2267, 13416, 29876, 448, 29896, 13, 29950, 29871, 29947, 9998, 1284, 278, 5807, 29900, 29878, 2167, 29955, 29871, 29929, 29992, 29974, 29882, 13, 1212, 1038, 1583, 29889, 1635, 29879, 3552, 29949, 29892, 288, 876, 13, 13, 4801, 4489, 29879, 29898, 29920, 761, 29892, 1298, 29892, 5516, 29974, 264, 1125, 13, 361, 29918, 6207, 29892, 29875, 29945, 8667, 7295, 13, 1311, 29889, 5182, 448, 772, 326, 29873, 13, 29937, 302, 29876, 935, 16669, 2943, 408, 438, 13, 911, 29989, 29873, 29892, 7720, 29961, 3149, 29913, 353, 29871, 29900, 13, 7345, 270, 297, 1583, 29892, 29900, 262, 29901, 13, 29878, 486, 29918, 29886, 29900, 326, 29873, 353, 313, 29929, 29877, 326, 29873, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29877, 850, 29900, 511, 282, 29949, 524, 29961, 29896, 10062, 344, 29989, 29888, 29889, 3972, 29961, 29900, 3816, 29875, 5262, 13, 361, 29918, 6207, 19423, 273, 25363, 29941, 29898, 29881, 29897, 322, 1583, 29892, 629, 29896, 29900, 29961, 29885, 486, 29918, 29886, 29949, 262, 29974, 21540, 1360, 260, 601, 271, 10998, 524, 29374, 13, 6207, 29889, 29885, 283, 29872, 29898, 29881, 29913, 13, 29920, 761, 29889, 29069, 29898, 29876, 1193, 29918, 3149, 29952, 611, 29938, 357, 29897, 13, 8247, 29955, 29941, 29878, 29889, 29885, 29949, 345, 29898, 29879, 2152, 29889, 11569, 29918, 29881, 29991, 29878, 29961, 29881, 1800, 13, 13, 1753, 289, 5847, 29898, 29945, 29872, 29989, 29888, 29892, 380, 29987, 2273, 1125, 13, 311, 29929, 434, 448, 313, 29949, 645, 29941, 1953, 29892, 311, 29939, 345, 4197, 29898, 303, 29946, 593, 29892, 29871, 29900, 29897, 2314, 13, 1332, 29991, 280, 270, 30181, 29939, 29894, 30181, 29901, 13, 3149, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 1896, 1298, 353, 29899, 29871, 29945, 761, 29889, 29873, 574, 29941, 29873, 29901, 13, 2457, 4331, 13, 29888, 29900, 29878, 270, 29918, 326, 1583, 29889, 29900, 262, 29889, 791, 1960, 7295, 13, 29876, 486, 29918, 3149, 29918, 29899, 313, 29886, 29900, 524, 29961, 29900, 29962, 29911, 29881, 29961, 29949, 1118, 29871, 29929, 29877, 29896, 593, 29961, 29896, 10062, 29881, 29961, 29896, 12258, 13, 277, 29918, 1311, 29892, 29929, 2429, 29961, 29876, 486, 29918, 1129, 262, 28135, 1275, 29871, 29900, 29901, 13, 29881, 30181, 802, 29892, 932, 8154, 3552, 29876, 486, 29918, 29939, 2461, 29892, 269, 29955, 1022, 29974, 29875, 876, 13, 29937, 286, 29992, 17697, 1998, 29991, 29974, 287, 2943, 2698, 448, 29896, 13, 1311, 29892, 5138, 333, 29961, 29876, 486, 29918, 3149, 29897, 353, 448, 29896, 29961, 29914, 25580, 29962, 1670, 526, 3196, 5626, 411, 596, 775, 763, 17422, 3459, 2983, 29892, 2743, 5132, 5877, 2992, 29889, 13, 13, 10605, 338, 920, 393, 775, 881, 1106, 29901, 13, 13, 28956, 4691, 13, 5215, 16250, 13, 13, 1990, 24380, 29901, 13, 1678, 822, 1284, 21322, 342, 2605, 29898, 1311, 29892, 5835, 29901, 525, 5756, 19203, 1495, 1599, 938, 29901, 13, 4706, 1583, 29889, 3972, 353, 11117, 29965, 2396, 21069, 29896, 29892, 29871, 29900, 1402, 29915, 29928, 2396, 518, 29896, 29892, 29871, 29900, 1402, 525, 29931, 2396, 518, 29900, 6653, 29896, 1402, 525, 29934, 2396, 518, 29900, 29892, 29871, 29896, 12258, 13, 4706, 1583, 29889, 11569, 29918, 3972, 353, 11117, 29965, 2396, 525, 29928, 742, 525, 29928, 2396, 525, 29965, 742, 525, 29931, 2396, 525, 29934, 742, 525, 29934, 2396, 525, 29931, 10827, 13, 4706, 1583, 29889, 7720, 353, 16250, 29889, 4381, 8977, 29898, 2892, 29901, 7411, 877, 7192, 8785, 13, 4706, 396, 360, 9998, 2791, 714, 278, 2989, 2910, 13, 4706, 1583, 29889, 5182, 353, 6213, 13, 4706, 1583, 29889, 29069, 3552, 29900, 29892, 29871, 29900, 511, 5835, 29897, 13, 4706, 565, 1583, 29889, 5182, 338, 6213, 29901, 13, 9651, 736, 448, 29896, 13, 4706, 396, 350, 9998, 1284, 278, 3273, 342, 2224, 13, 4706, 736, 1583, 29889, 1635, 29879, 3552, 29900, 29892, 29871, 29900, 876, 13, 13, 1678, 822, 4489, 29879, 29898, 1311, 29892, 1298, 29892, 5835, 1125, 13, 4706, 565, 5835, 29889, 275, 8667, 7295, 13, 9651, 1583, 29889, 5182, 353, 1298, 13, 4706, 396, 2791, 16669, 2943, 408, 29871, 29900, 13, 4706, 1583, 29889, 7720, 29961, 3149, 29962, 353, 29871, 29900, 13, 4706, 363, 270, 297, 1583, 29889, 3972, 29901, 13, 9651, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29900, 1402, 1298, 29961, 29896, 10062, 1311, 29889, 3972, 29961, 29881, 3816, 29896, 2314, 13, 9651, 565, 5835, 29889, 3068, 16619, 29898, 29881, 29897, 322, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 5785, 877, 7192, 29374, 13, 18884, 5835, 29889, 11631, 29898, 29881, 29897, 13, 18884, 1583, 29889, 29069, 29898, 29876, 486, 29918, 3149, 29892, 5835, 29897, 13, 18884, 5835, 29889, 11631, 29898, 1311, 29889, 11569, 29918, 3972, 29961, 29881, 2314, 13, 13, 1678, 822, 289, 5847, 29898, 1311, 29892, 1369, 1125, 13, 4706, 316, 802, 353, 16250, 29889, 311, 802, 4197, 29898, 2962, 29892, 29871, 29900, 29897, 2314, 13, 4706, 1550, 316, 802, 29901, 13, 9651, 1298, 29892, 4331, 353, 316, 802, 29889, 7323, 1563, 580, 13, 9651, 565, 1298, 1275, 1583, 29889, 5182, 29901, 13, 18884, 736, 4331, 13, 9651, 363, 270, 297, 1583, 29889, 3972, 29889, 5975, 7295, 13, 18884, 302, 486, 29918, 3149, 353, 313, 3149, 29961, 29900, 10062, 29881, 29961, 29900, 1402, 1298, 29961, 29896, 10062, 29881, 29961, 29896, 2314, 13, 18884, 565, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 1275, 29871, 29900, 29901, 13, 462, 1678, 316, 802, 29889, 4397, 3552, 29876, 486, 29918, 3149, 29892, 4331, 29974, 29896, 876, 13, 462, 1678, 396, 2791, 16669, 2943, 408, 448, 29896, 13, 462, 1678, 1583, 29889, 7720, 29961, 29876, 486, 29918, 3149, 29962, 353, 448, 29896, 13, 28956, 13, 12148, 4443, 393, 366, 3282, 29915, 29873, 3867, 278, 421, 5756, 19203, 29952, 770, 29892, 8151, 366, 505, 304, 1207, 1854, 393, 445, 4413, 505, 1906, 421, 275, 8667, 1673, 421, 3068, 16619, 29952, 322, 421, 11631, 29952, 3519, 297, 1797, 445, 775, 304, 664, 29889, 2] inputs: [INST]i've got this python code from an ocr tool, but it's not working. can you debug it for me? class 5olv7!on: oet tindShort€stPatn($elf` nn&s7er: 'GridM&s+er') -< 1rt: self.o1r = {'U': [-1, 0]._'0': [1, 0}, 'L': [0,_-1]. 'R': {0, 1]} selt.inv_d!r - {'V': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} z€|f.grio = (ollectioms.defaultdic7(lambda:_flo4t('lnf']) # DFS mark out the full map s€|f.targe+ - Non€ 5elf.dfs((0, 0), m&st3r) 1f $eit,t4rget_is Rone: retvrn -1 H 8FS find the sn0rtes7 9@+h neturr self.bfs((O, o)) det dfs(zelf, point, mas+en): if_master,i5Target(): self.target - poimt # nnark visited node as O ze|t,grid[point} = 0 tor d in self,0in: rxt_p0imt = (9oimt[0]+self.dir[o](0), pOint[1]+se|f.dir[0][i]] if_master. int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((nxt_point, step+1)) # mark visited node as -1 self.grid[nxt_point] = -1 ``` Please note that you didn't provide the `GridMaster` class, hence you have to make sure that this classes have those `isTarget`, `canMove` and `move` methods in order this code to work. label_idslabelshere are several issues with your code like flaky names, wrong Python syntax etc. Here is how that code should look: ```python import collections class Solution: def findShortestPath(self, master: 'GridMaster') -> int: self.dir = {'U': [-1, 0],'D': [1, 0], 'L': [0,-1], 'R': [0, 1]} self.inv_dir = {'U': 'D', 'D': 'U', 'L': 'R', 'R': 'L'} self.grid = collections.defaultdict(lambda:float('inf')) # DFS mark out the full map self.target = None self.dfs((0, 0), master) if self.target is None: return -1 # BFS find the shortest path return self.bfs((0, 0)) def dfs(self, point, master): if master.isTarget(): self.target = point # mark visited node as 0 self.grid[point] = 0 for d in self.dir: nxt_point = (point[0]+self.dir[d][0], point[1]+self.dir[d][1]) if master.canMove(d) and self.grid[nxt_point] == float('inf'): master.move(d) self.dfs(nxt_point, master) master.move(self.inv_dir[d]) def bfs(self, start): deque = collections.deque([(start, 0)]) while deque: point, step = deque.popleft() if point == self.target: return step for d in self.dir.values(): nxt_point = (point[0]+d[0], point[1]+d[1]) if self.grid[nxt_point] == 0: deque.append((n12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 25: Completed store-based barrier for key:store_based_barrier_key:2 with 32 nodes. 12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 26: Completed store-based barrier for key:store_based_barrier_key:2 with 32 nodes. vc-816648091-20231223-a1d29e4b-w12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 10: Completed store-based barrier for key:store_based_barrier_key:2 with 32 nodes. 12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 12: Completed store-based barrier for key:store_based_barrier_key:2 with 32 nodes. 12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 14: Completed store-based barrier for key:store_based_barrier_key:2 with 32 nodes. vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO Using network IB 12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 11: Completed store-based barrier for key:store_based_barrier_key:2 with 32 nodes. 12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 13: Completed store-based barrier for key:store_based_barrier_key:2 with 32 nodes. vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Using network IB vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO Using network IB 12/25/2023 02:59:28 - INFO - torch.distributed.divc-816648091-20231223-a1d29e4b-worker-3:6073:7103 [4] NCCL INFO Using network IB 12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 24: Completed store-based barrier12/25/2023 02:59:28 - INFO - torch.distributed.distributed_c10d - Rank 15: Completed store-based barrier for key:store_based_barrievc-816648091-20231223-a1d29e4b-worker-3:6072:7101 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,00000000,0000ffff,ffffffff vc-816648091-20231223-a1d29e4b-worker-3:6074:7100 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,ffff0000,00000000,ffffffff,ffff0000,00000000 vc-816648091-20231223-a1d29e4b-worker-3:6069:7104 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,00000000,0000ffff,ffffffff vc-816648091-20231223-a1d29e4b-worker-3:6075:7102 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,ffff0000,0000000vc-816648091-20231223-a1d29e4vc-816648091-20231223-a1d29e4b-worker-3:6073:7103 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,fffvc-816648091-20231223-a1d29e4b-worker-2:5774:6808 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,00000000,0000ffff,ffffffff vvc-816648091-20231223-a1d29e4b-worker-0:5795:6814 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,00000000,0000ffff,ffffffff vc-816648091-20231223-a1d29e4b-worker-0:5794:6813 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,00000000,0000ffff,ffffffff vc-816648091-20231223-a1d29e4b-worker-0:5798:6817 [4] vc-816648091-20231223-a1d29e4b-worker-3:6069:7104 [0] NCCL INFO Trees [0] 25/-1/-1->24->31 [1] 25/-1/-1->24->31 vc-816648091-20231223-a1d29e4b-worker-3:6071:7098 [2] NCCL INFO Trees [0] 27/-1/-1->26->18 [1] 27/10/-1->26->-1 vc-816648091-20231223-a1d29e4b-worker-3:6070:7097 [1] NCCL INFO Trees [0] -1/-1/-1->25->24 [1] -1/-1/-1->25->24 vc-816648091-20231223-a1d29e4b-worker-3:6072:7101 [3] NCCL INFO Trees [0] 28/-1/-1->27->26 [1] 28/-1/-1->27->26 vc-816648091-20231223-a1d29e4b-worker-3:6073:7103 [4] NCCL INFO Trees [0] 29/-1/-1->28->27 [1] 29/-1/-1->28->27 vc-816648091-20231223-a1d29e4b-worker-3:6074:7100 [5] NCCL INFO Trees [0] 30/-1/-1->29->28 [1] 30/-1/-1->29->28 vc-816648091-20231223-a1d29e4b-worker-3:6076:7099 [7] NCCL INFO Trees [0] 24/-1/-1->31->30 [1]vc-816648091-2023vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Trees [0] 9/-1/-1->8->15 [1] 9/-1/-1->8->15 vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Trees [0] 12/-1/-1->11->10 [1] 12/18/-1->11->10 vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Trees [0] 13/-1/-1->12->11 [1] 13/-1/-1->12->11 vc-816648091-20231223-a1d29e4b-worker-1:5786:6811 [1] NCCL INFO Trees [0] -1/-1/-1->9->8 [1] -1/-1/-1->9->8 vc-816648091-20231223-a1d29e4vc-816648091-20231223-a1d29e4b-worker-3:6073:7103 [4] NCCL INFO Channel 01/0 : 28[93000] -> 25[13000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-3:6071:7098 [2] NCCL INFO Channel 01/0 : 26[4b000] -> 31[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-3:6069:7104 [0] NCCL INFO Channel 01/0 : 24[e000]vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 01/0 : 10[4b000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Channel 01/0 : 8[e000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Channel 01/0 : 12[93000] -> 9[13000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO Channel 00/0 : 14[cb000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO Channel 00/0 : 13[99000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO Channel 01/0 : 14[cb000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO Channel 01/0 : 13[99000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO Channel 00/0 : 15[d0000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5786:6811 [1] NCCL INFO Channel 00/0 : 9[13000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO Channel 01/0 : 15[d0000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5786:6811 [1] NCCL INFO Channel 01/0 : 9[13000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5786:6811 [1] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Channel 00/0 : 8[e000] -> 9[13000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Channel 01/0 : 8[e000] -> 9[13000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Channel 00/0 : 12[93000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Channel 01/0 : 12[93000] -> 13[99000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO Channel 00/0 : 13[99000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO Channel 00/0 : 14[cb000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5786:6811 [1] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5786:6811 [1] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5786:6811 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Channel 00/0 : 8[e000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO Channel 01/0 : 13[99000] -> 14[cb000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO Channel 01/0 : 14[cb000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Channel 01/0 : 8[e000] -> 15[d0000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5790:6807 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO Channel 00/0 : 15[d0000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO Channel 01/0 : 15[d0000] -> 8[e000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5792:6810 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5785:6809 [0] NCCL INFO 2 collvc-816648091-20231223-a1d29e4b-worker-3:6072:7101 [3] NCCL INFO Channel 01/0 : 27[51000] -> 2[4b000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-3:6071:7098 [2] NCCL INFO Channel 01/0 : 19[51000] -> 26[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-3:6072:7101 [3] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-3:6072:7101 [3] NCCL INFO Channel 00/0 : 27[51000] -> 28[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-3:6071:7098 [2] NCCL INFO Connected all rings vc-816648091-20231223-a1vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Channel 00/0 : 11[51000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Connected all rings vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 00/0 : 10[4b000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Channel 01/0 : 11[51000] -> 12[93000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 01/0 : 10[4b000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Channel 00/0 : 12[93000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Channel 01/0 : 12[93000] -> 11[51000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 01/0 : 2[4b000] -> 10[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Channel 01/0 : 18[4b000] -> 11[51000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 00/0 : 10[4b000] -> 19[51000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 01/0 : 26[4b000] -> 10[4b000] [receive] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 01/0 : 10[4b000] -> 26[4b000] [send] viavc-816648091-2023vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 00/0 : 19[51000] -> 10[4b000] [receive] vc-816648091-2023122vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Channel 01/0 : 10[4b000] -> 2[4b000] [send] via NET/IB/0/GDRDMA vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Channel 00/0 : 11[51000] -> 10[4b000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Channel 01/0 : 11[51000] -> 10[4b000] via P2P/IPC/read vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5789:6806 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO Connected all trees vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 512 | 512 vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer vc-816648091-20231223-a1d29e4b-worker-1:5787:6804 [2] NCCL INFO comm 0x69223ab0 rank 10 nranks 32 cudaDev 2 busId 4b000 - Init COMPLETE vc-816648091-20231223-a1d29e4b-worker-1:5788:6808 [3] NCCL INFO comm 0x68e7b190 rank 11 nranks 32 cudaDev 3 busId 51000 - Init COMPLETE vc-816648091-20231223-a1d29e4b-worker-1:5791:6805 [6] NCCL INFO comm 0x688e7690 rank 14 nranks 32 cudaDev 6 busId12/25/2023 02:59:40 - WARNING - utils.llama_patch - Padded sequences are less efficient in FlashAttention. 12/25/2023 02:59:40 - WARNING - utils.llama_patch - Padded sequences are less efficient in FlashAttention. 12/25/2023 02:59:40 - WARNING - utils.llama_patch - Padded sequences are less efficient in FlashAttention. 12/25/2023 02:59:40 - WARNING - utils.llama_patch - Padded sequences are less efficient in FlashAttention. 12/25/2023 02:59:40 - WARNING - utils.llama_patch - Padded sequences are less efficient in FlashAttention. 12/25/2023 02:59:40 - WARNING - utils.llama_patch - Padded sequences are less efficient in FlashAttention. 12/25/2023 02:59:40 - WARNING - utils.llama_patch - Padded12/25/2023 02:59:39 - WARNING - utils.llama_patch - Padded sequences are less efficient in FlashAttention. 12/25/2023 02:59:39 - WARNING - utils.llama_patch{'loss': 0.6109, 'learning_rate': 1.9998183053318547e-05, 'epoch': 0.01} {'loss': 0.5587, 'learning_rate': 1.9992732873533223e-05, 'epoch': 0.02} {'loss': 0.5352, 'learning_rate': 1.9983651441181253e-05, 'epoch': 0.04} {'loss': 0.5315, 'learning_rate': 1.997094205635831e-05, 'epoch': 0.05} {'loss': 0.5115, 'learning_rate': 1.99546093375193e-05, 'epoch': 0.06} {'loss': 0.5241, 'learning_rate': 1.99346592198001e-05, 'epoch': 0.07} {'loss': 0.5085, 'learning_rate': 1.9911098952860726e-05, 'epoch': 0.08} {'loss': 0.4903, 'learning_rate': 1.9883937098250962e-05, 'epoch': 0.1} {'loss': 0.5028, 'learning_rate': 1.985318352629912e-05, 'epoch': 0.11} {'loss': 0.5097, 'learning_rate': 1.9818849412525294e-05, 'epoch': 0.12} {'loss': 0.5015, 'learning_rate': 1.978094723358031e-05, 'epoch': 0.13} {'loss': 0.5086, 'learning_rate': 1.9739490762711812e-05, 'epoch': 0.15} {'loss': 0.497, 'learning_rate': 1.969449506475924e-05, 'epoch': 0.16} {'loss': 0.5085, 'learning_rate': 1.9645976490679402e-05, 'epoch': 0.17} {'loss': 0.4994, 'learning_rate': 1.9593952671604737e-05, 'epoch': 0.18} {'loss': 0.4923, 'learning_rate': 1.953844251243633e-05, 'epoch': 0.19} {'loss': 0.4887, 'learning_rate': 1.947946618497407e-05, 'epoch': 0.21} {'loss': 0.496, 'learning_rate': 1.941704512058646e-05, 'epoch': 0.22} {'loss': 0.5039, 'learning_rate': 1.9351202002422654e-05, 'epoch': 0.23} {'loss': 0.4888, 'learning_rate': 1.928196075716966e-05, 'epoch': 0.24} {'loss': 0.4972, 'learning_rate': 1.920934654635764e-05, 'epoch': 0.25} {'loss': 0.4968, 'learning_rate': 1.9133385757216458e-05, 'epoch': 0.27} {'loss': 0.4952, 'learning_rate': 1.905410599308687e-05, 'epoch': 0.28} {'loss': 0.4831, 'learning_rate': 1.8971536063389745e-05, 'epoch': 0.29} {'loss': 0.4836, 'learning_rate': 1.888570597315703e-05, 'epoch': 0.3} {'loss': 0.4873, 'learning_rate': 1.8796646912128247e-05, 'epoch': 0.32} {'loss': 0.4773, 'learning_rate': 1.8704391243416478e-05, 'epoch': 0.33} {'loss': 0.49, 'learning_rate': 1.8608972491747946e-05, 'epoch': 0.34} {'loss': 0.4892, 'learning_rate': 1.8510425331279488e-05, 'epoch': 0.35} {'loss': 0.4794, 'learning_rate': 1.8408785572998335e-05, 'epoch': 0.36} {'loss': 0.4829, 'learning_rate': 1.8304090151708797e-05, 'epoch': 0.38} {'loss': 0.4814, 'learning_rate': 1.8196377112610524e-05, 'epoch': 0.39} {'loss': 0.4871, 'learning_rate': 1.808568559747331e-05, 'epoch': 0.4} {'loss': 0.4824, 'learning_rate': 1.7972055830413372e-05, 'epoch': 0.41} {'loss': 0.483, 'learning_rate': 1.7855529103276337e-05, 'epoch': 0.42} {'loss': 0.4701, 'learning_rate': 1.773614776063225e-05, 'epoch': 0.44} {'loss': 0.4859, 'learning_rate': 1.761395518438797e-05, 'epoch': 0.45} {'loss': 0.4622, 'learning_rate': 1.7488995778022687e-05, 'epoch': 0.46} {'loss': 0.4784, 'learning_rate': 1.7361314950452137e-05, 'epoch': 0.47} {'loss': 0.4888, 'learning_rate': 1.7230959099527512e-05, 'epoch': 0.49} {'loss': 0.4734, 'learning_rate': 1.709797559517496e-05, 'epoch': 0.5} {'loss': 0.4801, 'learning_rate': 1.6962412762181867e-05, 'epoch': 0.51} {'loss': 0.4895, 'learning_rate': 1.6824319862636137e-05, 'epoch': 0.52} {'loss': 0.4671, 'learning_rate': 1.6683747078024887e-05, 'epoch': 0.53} {'loss': 0.4881, 'learning_rate': 1.654074549099901e-05, 'epoch': 0.55} {'loss': 0.4781, 'learning_rate': 1.6395367066810312e-05, 'epoch': 0.56} {'loss': 0.488, 'learning_rate': 1.6247664634427866e-05, 'epoch': 0.57} {'loss': 0.4708, 'learning_rate': 1.6097691867340547e-05, 'epoch': 0.58} {'loss': 0.4811, 'learning_rate': 1.5945503264052638e-05, 'epoch': 0.59} {'loss': 0.4651, 'learning_rate': 1.5791154128279694e-05, 'epoch': 0.61} {'loss': 0.463, 'learning_rate': 1.5634700548851713e-05, 'epoch': 0.62} {'loss': 0.4769, 'learning_rate': 1.547619937933108e-05, 'epoch': 0.63} {'loss': 0.4791, 'learning_rate': 1.53157082173526e-05, 'epoch': 0.64} {'loss': 0.4905, 'learning_rate': 1.5153285383693091e-05, 'epoch': 0.65} {'loss': 0.4607, 'learning_rate': 1.4988989901078286e-05, 'epoch': 0.67} {'loss': 0.4764, 'learning_rate': 1.4822881472734563e-05, 'epoch': 0.68} {'loss': 0.489, 'learning_rate': 1.4655020460693452e-05, 'epoch': 0.69} {'loss': 0.4744, 'learning_rate': 1.4485467863856704e-05, 'epoch': 0.7} {'loss': 0.4722, 'learning_rate': 1.4314285295829957e-05, 'epoch': 0.72} {'loss': 0.4588, 'learning_rate': 1.4141534962532986e-05, 'epoch': 0.73} {'loss': 0.4779, 'learning_rate': 1.3967279639594753e-05, 'epoch': 0.74} {'loss': 0.4629, 'learning_rate': 1.3791582649541404e-05, 'epoch': 0.75} {'loss': 0.4826, 'learning_rate': 1.3614507838785547e-05, 'epoch': 0.76} {'loss': 0.4615, 'learning_rate': 1.3436119554425133e-05, 'epoch': 0.78} {'loss': 0.476, 'learning_rate': 1.3256482620860415e-05, 'epoch': 0.79} {'loss': 0.4746, 'learning_rate': 1.3075662316237466e-05, 'epoch': 0.8} {'loss': 0.4627, 'learning_rate': 1.2893724348726757e-05, 'epoch': 0.81} {'loss': 0.4645, 'learning_rate': 1.2710734832645557e-05, 'epoch': 0.82} {'loss': 0.472, 'learning_rate': 1.2526760264432658e-05, 'epoch': 0.84} {'loss': 0.4653, 'learning_rate': 1.2341867498484303e-05, 'epoch': 0.85} {'loss': 0.4713, 'learning_rate': 1.2156123722859989e-05, 'epoch': 0.86} {'loss': 0.4661, 'learning_rate': 1.1969596434867063e-05, 'epoch': 0.87} {'loss': 0.4761, 'learning_rate': 1.1782353416532908e-05, 'epoch': 0.89} {'loss': 0.4617, 'learning_rate': 1.1594462709973684e-05, 'epoch': 0.9} {'loss': 0.4595, 'learning_rate': 1.140599259266854e-05, 'epoch': 0.91} {'loss': 0.4677, 'learning_rate': 1.1217011552648316e-05, 'epoch': 0.92} {'loss': 0.4712, 'learning_rate': 1.102758826360772e-05, 'epoch': 0.93} {'loss': 0.4613, 'learning_rate': 1.0837791559950029e-05, 'epoch': 0.95} {'loss': 0.4598, 'learning_rate': 1.0647690411773415e-05, 'epoch': 0.96} {'loss': 0.4803, 'learning_rate': 1.0457353899807947e-05, 'epoch': 0.97} {'loss': 0.4549, 'learning_rate': 1.0266851190312375e-05, 'epoch': 0.98} {'loss': 0.463, 'learning_rate': 1.0076251509939867e-05, 'epoch': 0.99} [2023-12-25 12:56:04,297] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/zero_pp_rank_24_mp_rank_00_model_states.pt... [2023-12-25 12:56:04,334] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/zero_pp_rank_24_mp_rank_00_model_states.pt. [2023-12-25 12:56:04,374] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... [2023-12-25 12:56:37,474] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. [2023-12-25 12:56:37,474] [INFO] [engine.py:3285:_save_zero_checkpoint] zero checkpoint saved /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt [2023-12-25 12:56:37,487] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step824 is ready now! {'loss': 0.4129, 'learning_rate': 9.885624120581772e-06, 'epoch': 1.01} {'loss': 0.3427, 'learning_rate': 9.695038294198588e-06, 'epoch': 1.02} {'loss': 0.3457, 'learning_rate': 9.504563287647265e-06, 'epoch': 1.03} {'loss': 0.3401, 'learning_rate': 9.314268317514023e-06, 'epoch': 1.04} {'loss': 0.3354, 'learning_rate': 9.12422253496175e-06, 'epoch': 1.06} {'loss': 0.3391, 'learning_rate': 8.934495000601241e-06, 'epoch': 1.07} {'loss': 0.3428, 'learning_rate': 8.745154659395273e-06, 'epoch': 1.08} {'loss': 0.3339, 'learning_rate': 8.556270315604779e-06, 'epoch': 1.09} {'loss': 0.328, 'learning_rate': 8.36791060778608e-06, 'epoch': 1.1} {'loss': 0.3328, 'learning_rate': 8.180143983848388e-06, 'epoch': 1.12} {'loss': 0.3288, 'learning_rate': 7.993038676180546e-06, 'epoch': 1.13} {'loss': 0.3327, 'learning_rate': 7.806662676856134e-06, 'epoch': 1.14} {'loss': 0.3421, 'learning_rate': 7.62108371292584e-06, 'epoch': 1.15} {'loss': 0.3462, 'learning_rate': 7.436369221806201e-06, 'epoch': 1.16} {'loss': 0.3302, 'learning_rate': 7.2525863267735405e-06, 'epoch': 1.18} {'loss': 0.3278, 'learning_rate': 7.069801812572117e-06, 'epoch': 1.19} {'loss': 0.3402, 'learning_rate': 6.888082101145222e-06, 'epoch': 1.2} {'loss': 0.344, 'learning_rate': 6.707493227498187e-06, 'epoch': 1.21} {'loss': 0.3349, 'learning_rate': 6.5281008157019425e-06, 'epoch': 1.22} {'loss': 0.3379, 'learning_rate': 6.3499700550459554e-06, 'epoch': 1.24} {'loss': 0.3398, 'learning_rate': 6.173165676349103e-06, 'epoch': 1.25} {'loss': 0.3362, 'learning_rate': 5.99775192843722e-06, 'epoch': 1.26} {'loss': 0.3316, 'learning_rate': 5.823792554795738e-06, 'epoch': 1.27} {'loss': 0.3406, 'learning_rate': 5.6513507704059835e-06, 'epoch': 1.29} {'loss': 0.3266, 'learning_rate': 5.480489238773536e-06, 'epoch': 1.3} {'loss': 0.3393, 'learning_rate': 5.311270049156967e-06, 'epoch': 1.31} {'loss': 0.3286, 'learning_rate': 5.14375469400529e-06, 'epoch': 1.32} {'loss': 0.3338, 'learning_rate': 4.978004046612224e-06, 'epoch': 1.33} {'loss': 0.3247, 'learning_rate': 4.814078338995516e-06, 'epoch': 1.35} {'loss': 0.3184, 'learning_rate': 4.652037140009259e-06, 'epoch': 1.36} {'loss': 0.3297, 'learning_rate': 4.491939333697205e-06, 'epoch': 1.37} {'loss': 0.3378, 'learning_rate': 4.333843097894932e-06, 'epoch': 1.38} {'loss': 0.3426, 'learning_rate': 4.177805883088641e-06, 'epoch': 1.39} {'loss': 0.3354, 'learning_rate': 4.023884391538244e-06, 'epoch': 1.41} {'loss': 0.3348, 'learning_rate': 3.8721345566724156e-06, 'epoch': 1.42} {'loss': 0.3314, 'learning_rate': 3.722611522762917e-06, 'epoch': 1.43} {'loss': 0.3358, 'learning_rate': 3.575369624885803e-06, 'epoch': 1.44} {'loss': 0.3258, 'learning_rate': 3.4304623691766193e-06, 'epoch': 1.46} {'loss': 0.3296, 'learning_rate': 3.287942413386841e-06, 'epoch': 1.47} {'loss': 0.3364, 'learning_rate': 3.147861547748612e-06, 'epoch': 1.48} {'loss': 0.3206, 'learning_rate': 3.0102706761547264e-06, 'epoch': 1.49} {'loss': 0.3214, 'learning_rate': 2.875219797660681e-06, 'epoch': 1.5} {'loss': 0.3307, 'learning_rate': 2.7427579883155895e-06, 'epoch': 1.52} {'loss': 0.33, 'learning_rate': 2.612933383328432e-06, 'epoch': 1.53} {'loss': 0.3343, 'learning_rate': 2.4857931595762406e-06, 'epoch': 1.54} {'loss': 0.3258, 'learning_rate': 2.3613835184605527e-06, 'epoch': 1.55} {'loss': 0.3277, 'learning_rate': 2.239749669118272e-06, 'epoch': 1.56} {'loss': 0.3368, 'learning_rate': 2.1209358119931845e-06, 'epoch': 1.58} {'loss': 0.3297, 'learning_rate': 2.0049851227739746e-06, 'epoch': 1.59} {'loss': 0.3367, 'learning_rate': 1.891939736704641e-06, 'epoch': 1.6} {'loss': 0.3344, 'learning_rate': 1.7818407332729914e-06, 'epoch': 1.61} {'loss': 0.3388, 'learning_rate': 1.6747281212828193e-06, 'epoch': 1.63} {'loss': 0.3333, 'learning_rate': 1.570640824315095e-06, 'epoch': 1.64} {'loss': 0.3313, 'learning_rate': 1.4696166665835853e-06, 'epoch': 1.65} {'loss': 0.3355, 'learning_rate': 1.3716923591899167e-06, 'epoch': 1.66} {'loss': 0.3287, 'learning_rate': 1.2769034867831588e-06, 'epoch': 1.67} {'loss': 0.3294, 'learning_rate': 1.1852844946287434e-06, 'epoch': 1.69} {'loss': 0.3369, 'learning_rate': 1.0968686760914248e-06, 'epoch': 1.7} {'loss': 0.3289, 'learning_rate': 1.0116881605368112e-06, 'epoch': 1.71} {'loss': 0.3242, 'learning_rate': 9.297739016559226e-07, 'epoch': 1.72} {'loss': 0.3295, 'learning_rate': 8.511556662169218e-07, 'epoch': 1.73} {'loss': 0.3257, 'learning_rate': 7.758620232482083e-07, 'epoch': 1.75} {'loss': 0.3278, 'learning_rate': 7.039203336567247e-07, 'epoch': 1.76} {'loss': 0.3199, 'learning_rate': 6.353567402853056e-07, 'epoch': 1.77} {'loss': 0.3378, 'learning_rate': 5.701961584126392e-07, 'epoch': 1.78} {'loss': 0.3218, 'learning_rate': 5.084622666993244e-07, 'epoch': 1.8} {'loss': 0.326, 'learning_rate': 4.501774985832974e-07, 'epoch': 1.81} {'loss': 0.3359, 'learning_rate': 3.953630341277603e-07, 'epoch': 1.82} {'loss': 0.3304, 'learning_rate': 3.440387923245714e-07, 'epoch': 1.83} {'loss': 0.3239, 'learning_rate': 2.9622342385589256e-07, 'epoch': 1.84} {'loss': 0.333, 'learning_rate': 2.519343043167399e-07, 'epoch': 1.86} {'loss': 0.316, 'learning_rate': 2.111875279008657e-07, 'epoch': 1.87} {'loss': 0.3345, 'learning_rate': 1.7399790155230633e-07, 'epoch': 1.88} {'loss': 0.3284, 'learning_rate': 1.4037893958469994e-07, 'epoch': 1.89} {'loss': 0.3235, 'learning_rate': 1.1034285877032147e-07, 'epoch': 1.9} {'loss': 0.3304, 'learning_rate': 8.390057390064266e-08, 'epoch': 1.92} {'loss': 0.3197, 'learning_rate': 6.10616938200137e-08, 'epoch': 1.93} {'loss': 0.3394, 'learning_rate': 4.183451793390747e-08, 'epoch': 1.94} {'loss': 0.3267, 'learning_rate': 2.6226033193007538e-08, 'epoch': 1.95} {'loss': 0.3329, 'learning_rate': 1.424191155422583e-08, 'epoch': 1.96} {'loss': 0.3307, 'learning_rate': 5.886507919570239e-09, 'epoch': 1.98} {'loss': 0.3305, 'learning_rate': 1.1628585536216374e-09, 'epoch': 1.99} [2023-12-25 23:09:26,239] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/zero_pp_rank_24_mp_rank_00_model_states.pt... [2023-12-25 23:09:26,269] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/zero_pp_rank_24_mp_rank_00_model_states.pt. [2023-12-25 23:09:27,989] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... [2023-12-25 23:10:00,433] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. [2023-12-25 23:10:00,433] [INFO] [engine.py:3285:_save_zero_checkpoint] zero checkpoint saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt [2023-12-25 23:10:01,042] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1648 is ready now! {'train_runtime': 72622.4905, 'train_samples_per_second': 5.811, 'train_steps_per_second': 0.023, 'train_loss': 0.4087800704738469, 'epoch': 2.0} g/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... [2023-12-25 23:09:59,045] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. [2023-12-25 23:09:59,045] [INFO] [engine.py:3285:_save_zero_checkpoint] zero checkpoint saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt [2023-12-25 23:10:01,039] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1648 is ready now! {'train_runtime': 72622.4769, 'train_samples_per_second': 5.811, 'train_steps_per_second': 0.023, 'train_loss': 0.40862118040473716, 'epoch': 2.0} vc-816648091-20231223-a1d29e4b-worker-2:5776:6817 [2] NCCL INFO [Service thread] Connection closed by localRank 1 vc-816648091-20231223-a1d29e4b-worker-2:5776:6680 [2] NCCL INFO [Service thread] Connection closed by localRank 1 vc-816648091-20231223-a1d29e4b-worker-2:5776:6817 [2] NCCL INFO [Service thread] Connection closed by localRank 5 vc-816648091-20231223-a1d29e4b-worker-2:5776:6680 [2] NCCL INFO [Service thread] Connection closed by localRank 5 d 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6691 [3] NCCL INFO [Service thread] Connection closed by localRank -1 {'loss': 0.3302, 'learning_rate': 7.2525863267735405e-06, 'epoch': 1.18} {'loss': 0.3278, 'learning_rate': 7.069801812572117e-06, 'epoch': 1.19} {'loss': 0.3402, 'learning_rate': 6.888082101145222e-06, 'epoch': 1.2} vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] NCCL INFO [Service thread] Connection closed by localRank -1 {'loss': 0.344, 'learning_rate': 6.707493227498187e-06, 'epoch': 1.21} vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] NCCL INFO [Service thread] Connection closed by localRank -1 {'loss': 0.3349, 'learning_rate': 6.5281008157019425e-06, 'epoch': 1.22} vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5792:6687 [7] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6819 [0] NCCL INFO [Service thread] Connection closed by localRank -1 {'loss': 0.3379, 'learning_rate': 6.3499700550459554e-06, 'epoch': 1.24} vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5789:6815 [4] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5791:6686 [6] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5785:6684 [0] NCCL INFO [Service thread] Connection closed by localRank -1 {'loss': 0.3398, 'learning_rate': 6.173165676349103e-06, 'epoch': 1.25} vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5786:6685 [1] NCCL INFO [Service thread] Connection closed by localRank -1 {'loss': 0.3362, 'learning_rate': 5.99775192843722e-06, 'epoch': 1.26} vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5788:6818 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-1:5790:6689 [5] NCCL INFO [Service thread] Connection closed by localRank -1 {'loss': 0.3316, 'learning_rate': 5.823792554795738e-06, 'epoch': 1.27} {'loss': 0.3406, 'learning_rate': 5.6513507704059835e-06, 'epoch': 1.29} {'loss': 0.3266, 'learning_rate': 5.480489238773536e-06, 'epoch': 1.3} {'loss': 0.3393, 'learning_rate': 5.311270049156967e-06, 'epoch': 1.31} {'loss': 0.3286, 'learning_rate': 5.14375469400529e-06, 'epoch': 1.32} {'loss': 0.3338, 'learning_rate': 4.978004046612224e-06, 'epoch': 1.33} {'loss': 0.3247, 'learning_rate': 4.814078338995516e-06, 'epoch': 1.35} {'loss': 0.3184, 'learning_rate': 4.652037140009259e-06, 'epoch': 1.36} {'loss': 0.3297, 'learning_rate': 4.491939333697205e-06, 'epoch': 1.37} {'loss': 0.3378, 'learning_rate': 4.333843097894932e-06, 'epoch': 1.38} {'loss': 0.3426, 'learning_rate': 4.177805883088641e-06, 'epoch': 1.39} {'loss': 0.3354, 'learning_rate': 4.023884391538244e-06, 'epoch': 1.41} {'loss': 0.3348, 'learning_rate': 3.8721345566724156e-06, 'epoch': 1.42} {'loss': 0.3314, 'learning_rate': 3.722611522762917e-06, 'epoch': 1.43} {'loss': 0.3358, 'learning_rate': 3.575369624885803e-06, 'epoch': 1.44} {'loss': 0.3258, 'learning_rate': 3.4304623691766193e-06, 'epoch': 1.46} {'loss': 0.3296, 'learning_rate': 3.287942413386841e-06, 'epoch': 1.47} {'loss': 0.3364, 'learning_rate': 3.147861547748612e-06, 'epoch': 1.48} {'loss': 0.3206, 'learning_rate': 3.0102706761547264e-06, 'epoch': 1.49} {'loss': 0.3214, 'learning_rate': 2.875219797660681e-06, 'epoch': 1.5} {'loss': 0.3307, 'learning_rate': 2.7427579883155895e-06, 'epoch': 1.52} {'loss': 0.33, 'learning_rate': 2.612933383328432e-06, 'epoch': 1.53} {'loss': 0.3343, 'learning_rate': 2.4857931595762406e-06, 'epoch': 1.54} {'loss': 0.3258, 'learning_rate': 2.3613835184605527e-06, 'epoch': 1.55} {'loss': 0.3277, 'learning_rate': 2.239749669118272e-06, 'epoch': 1.56} {'loss': 0.3368, 'learning_rate': 2.1209358119931845e-06, 'epoch': 1.58} {'loss': 0.3297, 'learning_rate': 2.0049851227739746e-06, 'epoch': 1.59} {'loss': 0.3367, 'learning_rate': 1.891939736704641e-06, 'epoch': 1.6} {'loss': 0.3344, 'learning_rate': 1.7818407332729914e-06, 'epoch': 1.61} {'loss': 0.3388, 'learning_rate': 1.6747281212828193e-06, 'epoch': 1.63} {'loss': 0.3333, 'learning_rate': 1.570640824315095e-06, 'epoch': 1.64} {'loss': 0.3313, 'learning_rate': 1.4696166665835853e-06, 'epoch': 1.65} {'loss': 0.3355, 'learning_rate': 1.3716923591899167e-06, 'epoch': 1.66} {'loss': 0.3287, 'learning_rate': 1.2769034867831588e-06, 'epoch': 1.67} {'loss': 0.3294, 'learning_rate': 1.1852844946287434e-06, 'epoch': 1.69} {'loss': 0.3369, 'learning_rate': 1.0968686760914248e-06, 'epoch': 1.7} {'loss': 0.3289, 'learning_rate': 1.0116881605368112e-06, 'epoch': 1.71} {'loss': 0.3242, 'learning_rate': 9.297739016559226e-07, 'epoch': 1.72} {'loss': 0.3295, 'learning_rate': 8.511556662169218e-07, 'epoch': 1.73} {'loss': 0.3257, 'learning_rate': 7.758620232482083e-07, 'epoch': 1.75} {'loss': 0.3278, 'learning_rate': 7.039203336567247e-07, 'epoch': 1.76} {'loss': 0.3199, 'learning_rate': 6.353567402853056e-07, 'epoch': 1.77} {'loss': 0.3378, 'learning_rate': 5.701961584126392e-07, 'epoch': 1.78} {'loss': 0.3218, 'learning_rate': 5.084622666993244e-07, 'epoch': 1.8} {'loss': 0.326, 'learning_rate': 4.501774985832974e-07, 'epoch': 1.81} {'loss': 0.3359, 'learning_rate': 3.953630341277603e-07, 'epoch': 1.82} {'loss': 0.3304, 'learning_rate': 3.440387923245714e-07, 'epoch': 1.83} {'loss': 0.3239, 'learning_rate': 2.9622342385589256e-07, 'epoch': 1.84} {'loss': 0.333, 'learning_rate': 2.519343043167399e-07, 'epoch': 1.86} {'loss': 0.316, 'learning_rate': 2.111875279008657e-07, 'epoch': 1.87} {'loss': 0.3345, 'learning_rate': 1.7399790155230633e-07, 'epoch': 1.88} {'loss': 0.3284, 'learning_rate': 1.4037893958469994e-07, 'epoch': 1.89} {'loss': 0.3235, 'learning_rate': 1.1034285877032147e-07, 'epoch': 1.9} {'loss': 0.3304, 'learning_rate': 8.390057390064266e-08, 'epoch': 1.92} {'loss': 0.3197, 'learning_rate': 6.10616938200137e-08, 'epoch': 1.93} {'loss': 0.3394, 'learning_rate': 4.183451793390747e-08, 'epoch': 1.94} {'loss': 0.3267, 'learning_rate': 2.6226033193007538e-08, 'epoch': 1.95} {'loss': 0.3329, 'learning_rate': 1.424191155422583e-08, 'epoch': 1.96} {'loss': 0.3307, 'learning_rate': 5.886507919570239e-09, 'epoch': 1.98} {'loss': 0.3305, 'learning_rate': 1.1628585536216374e-09, 'epoch': 1.99} [2023-12-25 23:09:26,271] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/zero_pp_rank_8_mp_rank_00_model_states.pt... [2023-12-25 23:09:26,293] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/zero_pp_rank_8_mp_rank_00_model_states.pt. [2023-12-25 23:09:28,022] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... [2023-12-25 23:10:00,280] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. [2023-12-25 23:10:00,280] [INFO] [engine.py:3285:_save_zero_checkpoint] zero checkpoint saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt [2023-12-25 23:10:01,074] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1648 is ready now! {'train_runtime': 72622.6275, 'train_samples_per_second': 5.811, 'train_steps_per_second': 0.023, 'train_loss': 0.40901411404308763, 'epoch': 2.0} vc-816648091-20231223-a1d29e4b-worker-1:5787:6816 [0] NCCL INFO [Service thread] Connection closed by localRank 3 vc-816648091-20231223-a1d29e4b-worker-1:5787:6688 [2] NCCL INFO [Service thread] Connection closed by localRank 3 ccelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:07:35,810] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:08:18,743] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:09:01,272] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:09:44,857] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:10:28,810] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:11:11,783] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:11:54,934] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.5097, 'learning_rate': 1.9818849412525294e-05, 'epoch': 0.12} [2023-12-25 04:12:38,808] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:13:21,260] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:14:04,108] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:14:48,497] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:15:31,278] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:16:14,222] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:16:57,148] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:17:41,658] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:18:25,418] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:19:08,700] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.5015, 'learning_rate': 1.978094723358031e-05, 'epoch': 0.13} [2023-12-25 04:19:51,656] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:20:35,397] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:21:18,717] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:22:01,877] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:22:45,731] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:23:29,392] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:24:12,922] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:24:56,113] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:25:39,219] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:26:22,270] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.5086, 'learning_rate': 1.9739490762711812e-05, 'epoch': 0.15} [2023-12-25 04:27:06,282] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:27:49,383] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:28:32,601] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:29:15,418] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:29:58,691] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:30:42,211] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:31:25,499] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:32:08,573] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:32:52,240] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:33:35,298] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.497, 'learning_rate': 1.969449506475924e-05, 'epoch': 0.16} [2023-12-25 04:34:19,245] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:35:01,743] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:35:43,835] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:36:27,537] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:37:10,894] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:37:53,539] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:38:37,980] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:39:21,292] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:40:05,958] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:40:48,988] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.5085, 'learning_rate': 1.9645976490679402e-05, 'epoch': 0.17} [2023-12-25 04:41:32,036] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:42:15,603] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:42:58,842] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:43:41,390] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:44:24,847] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:45:06,945] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:45:49,905] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:46:34,006] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:47:18,929] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:48:01,762] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4994, 'learning_rate': 1.9593952671604737e-05, 'epoch': 0.18} [2023-12-25 04:48:45,059] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:49:28,806] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:50:11,950] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:50:54,999] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:51:38,751] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:52:20,978] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:53:05,561] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:53:48,286] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:54:31,065] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:55:13,741] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4923, 'learning_rate': 1.953844251243633e-05, 'epoch': 0.19} [2023-12-25 04:55:56,119] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:56:39,847] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:57:22,921] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:58:06,549] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:58:49,698] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 04:59:32,254] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:00:15,391] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:00:58,002] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:01:40,714] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:02:24,091] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4887, 'learning_rate': 1.947946618497407e-05, 'epoch': 0.21} [2023-12-25 05:03:07,001] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:03:50,315] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:04:34,016] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:05:18,338] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:06:01,471] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:06:45,104] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:07:28,609] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:08:12,146] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:08:55,572] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:09:38,859] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.496, 'learning_rate': 1.941704512058646e-05, 'epoch': 0.22} [2023-12-25 05:10:23,065] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:11:06,541] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:11:49,857] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:12:33,755] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:13:18,334] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:14:01,985] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:14:46,052] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:15:28,870] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:16:12,294] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:16:56,371] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.5039, 'learning_rate': 1.9351202002422654e-05, 'epoch': 0.23} [2023-12-25 05:17:38,523] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:18:21,734] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:19:05,908] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:19:49,385] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:20:32,761] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:21:16,215] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:21:59,162] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:22:41,937] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:23:25,656] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:24:09,175] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4888, 'learning_rate': 1.928196075716966e-05, 'epoch': 0.24} [2023-12-25 05:24:52,031] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:25:35,103] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:26:18,833] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:27:01,899] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:27:46,773] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:28:29,934] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:29:13,070] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:29:55,671] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:30:38,657] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:31:22,185] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4972, 'learning_rate': 1.920934654635764e-05, 'epoch': 0.25} [2023-12-25 05:32:06,278] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:32:49,575] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:33:32,238] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:34:14,974] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:34:57,327] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:35:40,620] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:36:23,885] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:37:07,390] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:37:50,564] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:38:33,681] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4968, 'learning_rate': 1.9133385757216458e-05, 'epoch': 0.27} [2023-12-25 05:39:18,155] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:40:01,570] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:40:45,716] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:41:28,646] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:42:11,578] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:42:54,665] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:43:38,431] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:44:21,836] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:45:06,385] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:45:50,259] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4952, 'learning_rate': 1.905410599308687e-05, 'epoch': 0.28} [2023-12-25 05:46:32,984] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:47:16,183] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:47:59,701] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:48:42,494] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:49:24,979] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:50:08,006] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:50:51,166] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:51:33,940] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:52:18,164] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:53:00,940] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4831, 'learning_rate': 1.8971536063389745e-05, 'epoch': 0.29} [2023-12-25 05:53:45,022] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:54:28,095] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:55:10,698] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:55:53,707] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:56:38,173] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:57:21,241] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:58:03,580] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:58:48,882] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 05:59:32,541] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:00:16,902] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4836, 'learning_rate': 1.888570597315703e-05, 'epoch': 0.3} [2023-12-25 06:00:59,613] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:01:42,516] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:02:26,207] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:03:10,000] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:03:53,040] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:04:36,916] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:05:19,756] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:06:02,594] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:06:47,340] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:07:30,421] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4873, 'learning_rate': 1.8796646912128247e-05, 'epoch': 0.32} [2023-12-25 06:08:13,679] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:08:56,696] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:09:40,388] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:10:23,227] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:11:07,279] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:11:50,537] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:12:33,978] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:13:18,449] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:14:01,601] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:14:46,276] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4773, 'learning_rate': 1.8704391243416478e-05, 'epoch': 0.33} [2023-12-25 06:15:29,234] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:16:13,080] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:16:56,874] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:17:40,286] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:18:23,946] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:19:06,612] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:19:49,462] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:20:31,794] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:21:15,286] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:21:59,008] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.49, 'learning_rate': 1.8608972491747946e-05, 'epoch': 0.34} [2023-12-25 06:22:42,819] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:23:26,762] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:24:09,594] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:24:52,597] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:25:35,999] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:26:18,889] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:27:01,986] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:27:46,686] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:28:30,050] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:29:13,976] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4892, 'learning_rate': 1.8510425331279488e-05, 'epoch': 0.35} [2023-12-25 06:29:57,142] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:30:39,361] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:31:22,209] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:32:06,349] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:32:49,803] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:33:33,219] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:34:16,608] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:34:59,299] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:35:42,453] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:36:26,334] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4794, 'learning_rate': 1.8408785572998335e-05, 'epoch': 0.36} [2023-12-25 06:37:09,139] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:37:51,929] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:38:34,906] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:39:18,083] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:40:00,536] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:40:44,062] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:41:26,980] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:42:09,615] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:42:52,641] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:43:34,999] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4829, 'learning_rate': 1.8304090151708797e-05, 'epoch': 0.38} [2023-12-25 06:44:18,474] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:45:00,439] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:45:43,065] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:46:26,566] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:47:09,506] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:47:52,956] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:48:36,058] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:49:19,136] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:50:02,595] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:50:46,960] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4814, 'learning_rate': 1.8196377112610524e-05, 'epoch': 0.39} [2023-12-25 06:51:30,788] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:52:13,802] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:52:56,762] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:53:39,422] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:54:22,881] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:55:06,212] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:55:49,377] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:56:32,120] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:57:15,223] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:57:57,933] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4871, 'learning_rate': 1.808568559747331e-05, 'epoch': 0.4} [2023-12-25 06:58:40,967] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 06:59:24,138] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:00:07,403] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:00:50,096] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:01:33,974] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:02:18,787] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:03:01,372] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:03:45,193] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:04:28,649] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:05:12,211] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4824, 'learning_rate': 1.7972055830413372e-05, 'epoch': 0.41} [2023-12-25 07:05:55,338] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:06:38,739] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:07:21,979] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:08:04,118] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:08:48,067] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:09:31,472] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:10:14,681] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:10:58,006] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:11:41,439] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:12:24,591] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.483, 'learning_rate': 1.7855529103276337e-05, 'epoch': 0.42} [2023-12-25 07:13:06,742] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:13:50,691] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:14:33,872] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:15:16,706] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:16:00,102] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:16:43,116] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:17:26,061] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:18:09,063] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:18:52,346] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:19:35,837] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4701, 'learning_rate': 1.773614776063225e-05, 'epoch': 0.44} [2023-12-25 07:20:18,720] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:21:02,152] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:21:46,649] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:22:30,357] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:23:13,197] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:23:56,089] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:24:39,167] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:25:22,691] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:26:06,144] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:26:48,722] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4859, 'learning_rate': 1.761395518438797e-05, 'epoch': 0.45} [2023-12-25 07:27:31,722] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:28:15,256] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:28:57,984] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:29:41,840] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:30:26,374] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:31:10,046] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:31:53,417] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:32:36,676] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:33:19,605] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:34:02,476] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4622, 'learning_rate': 1.7488995778022687e-05, 'epoch': 0.46} [2023-12-25 07:34:46,254] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:35:29,494] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:36:11,421] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:36:54,845] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:37:38,643] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:38:21,373] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:39:06,104] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:39:48,917] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:40:31,972] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:41:15,415] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4784, 'learning_rate': 1.7361314950452137e-05, 'epoch': 0.47} [2023-12-25 07:41:58,644] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:42:42,188] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:43:25,269] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:44:07,920] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:44:51,001] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:45:34,887] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:46:19,177] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:47:02,491] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:47:46,970] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:48:29,817] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4888, 'learning_rate': 1.7230959099527512e-05, 'epoch': 0.49} [2023-12-25 07:49:13,428] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:49:56,668] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:50:39,732] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:51:21,988] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:52:06,145] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:52:49,325] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:53:31,605] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:54:14,730] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:54:58,150] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:55:40,502] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4734, 'learning_rate': 1.709797559517496e-05, 'epoch': 0.5} [2023-12-25 07:56:24,005] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:57:07,253] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:57:49,471] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:58:32,246] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:59:14,520] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 07:59:57,821] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:00:41,256] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:01:24,240] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:02:07,503] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:02:51,035] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4801, 'learning_rate': 1.6962412762181867e-05, 'epoch': 0.51} [2023-12-25 08:03:34,045] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:04:16,625] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:04:59,956] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:05:42,892] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:06:25,752] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:07:08,653] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:07:50,933] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:08:33,652] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:09:17,000] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:10:00,008] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4895, 'learning_rate': 1.6824319862636137e-05, 'epoch': 0.52} [2023-12-25 08:10:44,543] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:11:27,412] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:12:10,049] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:12:53,441] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:13:37,496] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:14:20,739] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:15:03,110] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:15:46,710] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:16:29,707] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:17:13,384] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4671, 'learning_rate': 1.6683747078024887e-05, 'epoch': 0.53} [2023-12-25 08:17:56,446] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:18:39,201] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:19:21,497] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:20:06,406] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:20:49,089] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:21:32,086] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:22:14,897] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:22:57,754] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:23:40,885] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:24:24,536] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4881, 'learning_rate': 1.654074549099901e-05, 'epoch': 0.55} [2023-12-25 08:25:07,602] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:25:51,003] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:26:34,197] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:27:18,652] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:28:02,102] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:28:46,422] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:29:29,501] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:30:12,457] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:30:56,262] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:31:40,027] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4781, 'learning_rate': 1.6395367066810312e-05, 'epoch': 0.56} [2023-12-25 08:32:23,484] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:33:06,514] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:33:48,940] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:34:32,562] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:35:15,237] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:35:58,354] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:36:41,686] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:37:24,819] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:38:07,947] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:38:50,182] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.488, 'learning_rate': 1.6247664634427866e-05, 'epoch': 0.57} [2023-12-25 08:39:33,829] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:40:17,476] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:40:59,702] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:41:43,544] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:42:26,565] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:43:10,225] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:43:52,492] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:44:36,359] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:45:19,241] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:46:00,944] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4708, 'learning_rate': 1.6097691867340547e-05, 'epoch': 0.58} [2023-12-25 08:46:44,391] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:47:27,908] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:48:11,291] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:48:53,815] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:49:38,912] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:50:23,159] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:51:07,236] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:51:50,357] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:52:33,376] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:53:16,957] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4811, 'learning_rate': 1.5945503264052638e-05, 'epoch': 0.59} [2023-12-25 08:53:59,693] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:54:43,468] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:55:26,245] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:56:08,979] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:56:51,642] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:57:35,081] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:58:18,830] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:59:01,529] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 08:59:46,272] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:00:29,787] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4651, 'learning_rate': 1.5791154128279694e-05, 'epoch': 0.61} [2023-12-25 09:01:13,817] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:01:55,957] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:02:39,176] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:03:21,380] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:04:05,844] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:04:49,690] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:05:32,910] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:06:15,246] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:06:58,810] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:07:41,529] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.463, 'learning_rate': 1.5634700548851713e-05, 'epoch': 0.62} [2023-12-25 09:08:24,842] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:09:07,989] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:09:50,377] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:10:33,638] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:11:16,939] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:12:00,245] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:12:42,825] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:13:25,883] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:14:08,539] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:14:50,807] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4769, 'learning_rate': 1.547619937933108e-05, 'epoch': 0.63} [2023-12-25 09:15:33,341] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:16:16,998] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:16:59,515] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:17:42,561] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:18:25,762] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:19:08,472] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:19:52,212] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:20:35,682] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:21:18,614] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:22:02,037] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4791, 'learning_rate': 1.53157082173526e-05, 'epoch': 0.64} [2023-12-25 09:22:45,602] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:23:28,146] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:24:11,272] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:24:54,149] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:25:37,750] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:26:21,018] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:27:05,840] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:27:48,293] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:28:31,719] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:29:14,527] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4905, 'learning_rate': 1.5153285383693091e-05, 'epoch': 0.65} [2023-12-25 09:29:57,497] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:30:40,677] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:31:23,319] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:32:06,728] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:32:51,197] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:33:34,400] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:34:18,920] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:35:02,467] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:35:46,735] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:36:29,913] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4607, 'learning_rate': 1.4988989901078286e-05, 'epoch': 0.67} [2023-12-25 09:37:13,354] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:37:56,503] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:38:39,797] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:39:23,998] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:40:07,181] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:40:50,417] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:41:33,224] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:42:16,023] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:42:58,750] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:43:41,926] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4764, 'learning_rate': 1.4822881472734563e-05, 'epoch': 0.68} [2023-12-25 09:44:25,839] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:45:09,537] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:45:52,607] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:46:36,232] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:47:19,874] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:48:03,183] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:48:47,288] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:49:30,670] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:50:14,161] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:50:57,773] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.489, 'learning_rate': 1.4655020460693452e-05, 'epoch': 0.69} [2023-12-25 09:51:40,712] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:52:24,982] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:53:08,593] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:53:52,314] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:54:34,770] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:55:18,595] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:56:02,083] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:56:46,466] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:57:30,286] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:58:13,572] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4744, 'learning_rate': 1.4485467863856704e-05, 'epoch': 0.7} [2023-12-25 09:58:56,707] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 09:59:39,451] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:00:22,344] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:01:06,167] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:01:49,007] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:02:32,914] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:03:17,917] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:04:01,204] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:04:44,114] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:05:28,004] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4722, 'learning_rate': 1.4314285295829957e-05, 'epoch': 0.72} [2023-12-25 10:06:10,192] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:06:53,066] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:07:37,078] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:08:19,937] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:09:03,092] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:09:47,383] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:10:30,540] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:11:14,423] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:11:57,550] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:12:40,865] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4588, 'learning_rate': 1.4141534962532986e-05, 'epoch': 0.73} [2023-12-25 10:13:25,079] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:14:07,902] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:14:50,368] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:15:33,972] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:16:18,503] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:17:02,167] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:17:45,803] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:18:29,087] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:19:13,117] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:19:55,654] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4779, 'learning_rate': 1.3967279639594753e-05, 'epoch': 0.74} [2023-12-25 10:20:39,070] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:21:22,908] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:22:06,328] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:22:48,885] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:23:31,646] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:24:14,456] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:24:58,043] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:25:41,252] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:26:24,124] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:27:06,930] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4629, 'learning_rate': 1.3791582649541404e-05, 'epoch': 0.75} [2023-12-25 10:27:50,808] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:28:34,134] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:29:17,725] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:30:00,797] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:30:43,455] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:31:26,362] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:32:08,916] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:32:52,014] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:33:35,677] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:34:19,057] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4826, 'learning_rate': 1.3614507838785547e-05, 'epoch': 0.76} [2023-12-25 10:35:01,867] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:35:45,036] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:36:27,736] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:37:09,997] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:37:52,782] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:38:36,097] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:39:18,821] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:40:01,523] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:40:46,284] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:41:29,095] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4615, 'learning_rate': 1.3436119554425133e-05, 'epoch': 0.78} [2023-12-25 10:42:12,423] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:42:55,735] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:43:38,323] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:44:20,918] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:45:03,073] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:45:47,207] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:46:30,504] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:47:12,590] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:47:56,015] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:48:38,898] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.476, 'learning_rate': 1.3256482620860415e-05, 'epoch': 0.79} [2023-12-25 10:49:21,782] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:50:05,466] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:50:49,201] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:51:32,546] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:52:15,371] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:52:58,232] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:53:41,590] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:54:24,610] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:55:07,550] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:55:50,953] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4746, 'learning_rate': 1.3075662316237466e-05, 'epoch': 0.8} [2023-12-25 10:56:34,671] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:57:18,547] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:58:01,818] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:58:46,353] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 10:59:29,358] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:00:12,591] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:00:55,416] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:01:38,192] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:02:20,640] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:03:04,000] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4627, 'learning_rate': 1.2893724348726757e-05, 'epoch': 0.81} [2023-12-25 11:03:47,733] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:04:31,018] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:05:13,923] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:05:56,926] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:06:39,729] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:07:23,154] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:08:06,682] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:08:50,767] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:09:33,995] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:10:18,128] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4645, 'learning_rate': 1.2710734832645557e-05, 'epoch': 0.82} [2023-12-25 11:10:59,747] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:11:42,597] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:12:26,205] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:13:08,664] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:13:50,922] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:14:33,374] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:15:16,126] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:15:59,249] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:16:41,951] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:17:25,233] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.472, 'learning_rate': 1.2526760264432658e-05, 'epoch': 0.84} [2023-12-25 11:18:08,645] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:18:51,383] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:19:35,253] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:20:19,354] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:21:02,481] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:21:46,268] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:22:29,244] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:23:12,751] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:23:56,667] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:24:40,041] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4653, 'learning_rate': 1.2341867498484303e-05, 'epoch': 0.85} [2023-12-25 11:25:22,503] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:26:06,498] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:26:49,069] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:27:31,448] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:28:14,677] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:28:57,389] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:29:40,244] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:30:23,773] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:31:06,418] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:31:48,985] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4713, 'learning_rate': 1.2156123722859989e-05, 'epoch': 0.86} [2023-12-25 11:32:32,455] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:33:15,660] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:33:57,885] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:34:41,433] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:35:24,112] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:36:06,621] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:36:48,904] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:37:32,470] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:38:15,991] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:38:58,844] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4661, 'learning_rate': 1.1969596434867063e-05, 'epoch': 0.87} [2023-12-25 11:39:41,428] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:40:24,710] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:41:07,755] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:41:50,679] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:42:34,245] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:43:18,212] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:44:00,785] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:44:43,525] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:45:26,386] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:46:10,065] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4761, 'learning_rate': 1.1782353416532908e-05, 'epoch': 0.89} [2023-12-25 11:46:52,921] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:47:36,255] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:48:18,839] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:49:01,905] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:49:46,255] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:50:28,324] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:51:11,647] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:51:55,183] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:52:38,881] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:53:21,478] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4617, 'learning_rate': 1.1594462709973684e-05, 'epoch': 0.9} [2023-12-25 11:54:05,592] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:54:48,900] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:55:31,777] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:56:14,309] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:56:57,235] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:57:40,677] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:58:24,373] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:59:07,472] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 11:59:51,024] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:00:33,479] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4595, 'learning_rate': 1.140599259266854e-05, 'epoch': 0.91} [2023-12-25 12:01:18,641] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:02:01,402] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:02:45,578] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:03:28,801] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:04:11,502] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:04:54,381] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:05:38,471] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:06:20,779] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:07:05,596] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:07:47,819] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4677, 'learning_rate': 1.1217011552648316e-05, 'epoch': 0.92} [2023-12-25 12:08:31,170] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:09:13,852] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:09:56,683] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:10:39,661] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:11:22,466] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:12:06,336] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:12:49,400] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:13:33,018] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:14:15,691] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:14:57,499] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4712, 'learning_rate': 1.102758826360772e-05, 'epoch': 0.93} [2023-12-25 12:15:40,374] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:16:23,150] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:17:06,773] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:17:49,904] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:18:32,919] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:19:15,142] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:19:58,293] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:20:41,748] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:21:25,119] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:22:08,358] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4613, 'learning_rate': 1.0837791559950029e-05, 'epoch': 0.95} [2023-12-25 12:22:51,485] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:23:34,602] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:24:18,409] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:25:01,391] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:25:45,397] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:26:28,954] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:27:12,252] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:27:55,699] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:28:39,243] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:29:23,182] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4598, 'learning_rate': 1.0647690411773415e-05, 'epoch': 0.96} [2023-12-25 12:30:06,221] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:30:49,066] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:31:31,856] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:32:14,818] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:32:57,217] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:33:40,540] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:34:23,747] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:35:06,248] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:35:48,828] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:36:32,420] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4803, 'learning_rate': 1.0457353899807947e-05, 'epoch': 0.97} [2023-12-25 12:37:16,287] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:37:58,875] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:38:42,269] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:39:26,139] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:40:10,089] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:40:53,660] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:41:39,029] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:42:21,096] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:43:04,029] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:43:48,170] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4549, 'learning_rate': 1.0266851190312375e-05, 'epoch': 0.98} [2023-12-25 12:44:31,264] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:45:14,644] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:45:58,015] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:46:41,115] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:47:24,443] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:48:08,653] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:48:51,651] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:49:35,330] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:50:18,589] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:51:01,096] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.463, 'learning_rate': 1.0076251509939867e-05, 'epoch': 0.99} [2023-12-25 12:51:44,596] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:52:28,082] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:53:10,327] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:53:53,664] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:56:04,292] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step824 is about to be saved! [2023-12-25 12:56:04,310] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/zero_pp_rank_0_mp_rank_00_model_states.pt [2023-12-25 12:56:04,310] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/zero_pp_rank_0_mp_rank_00_model_states.pt... [2023-12-25 12:56:04,338] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/zero_pp_rank_0_mp_rank_00_model_states.pt. [2023-12-25 12:56:04,389] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-12-25 12:56:34,712] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-12-25 12:56:34,737] [INFO] [engine.py:3285:_save_zero_checkpoint] zero checkpoint saved /group/20025/jiuding/ckpt/34B_final/checkpoint-824/global_step824/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-12-25 12:56:37,503] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step824 is ready now! [2023-12-25 12:56:59,927] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:57:42,763] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:58:26,423] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:59:09,822] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 12:59:52,908] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:00:35,335] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.4129, 'learning_rate': 9.885624120581772e-06, 'epoch': 1.01} [2023-12-25 13:01:18,783] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:02:01,518] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:02:44,414] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:03:28,582] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:04:11,966] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:04:54,926] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:05:39,291] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:06:22,023] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:07:05,755] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:07:48,827] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3427, 'learning_rate': 9.695038294198588e-06, 'epoch': 1.02} [2023-12-25 13:08:32,085] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:09:14,540] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:09:56,690] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:10:39,896] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:11:23,524] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:12:06,265] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:12:48,510] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:13:30,844] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:14:13,973] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:14:57,898] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3457, 'learning_rate': 9.504563287647265e-06, 'epoch': 1.03} [2023-12-25 13:15:41,083] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:16:24,532] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:17:07,305] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:17:49,975] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:18:32,820] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:19:15,714] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:19:58,471] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:20:42,567] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:21:25,834] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:22:08,930] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3401, 'learning_rate': 9.314268317514023e-06, 'epoch': 1.04} [2023-12-25 13:22:52,154] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:23:35,628] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:24:19,389] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:25:03,019] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:25:46,648] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:26:29,774] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:27:12,472] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:27:55,250] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:28:38,431] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:29:22,026] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3354, 'learning_rate': 9.12422253496175e-06, 'epoch': 1.06} [2023-12-25 13:30:06,191] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:30:49,691] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:31:32,511] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:32:15,623] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:32:58,920] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:33:42,034] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:34:25,205] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:35:07,951] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:35:50,709] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:36:34,254] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3391, 'learning_rate': 8.934495000601241e-06, 'epoch': 1.07} [2023-12-25 13:37:18,830] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:38:02,023] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:38:46,366] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:39:29,368] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:40:12,755] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:40:56,401] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:41:39,728] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:42:23,604] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:43:07,194] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:43:50,623] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3428, 'learning_rate': 8.745154659395273e-06, 'epoch': 1.08} [2023-12-25 13:44:33,807] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:45:18,425] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:46:01,691] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:46:46,808] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:47:30,170] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:48:13,898] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:48:57,192] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:49:40,716] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:50:24,077] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:51:07,916] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3339, 'learning_rate': 8.556270315604779e-06, 'epoch': 1.09} [2023-12-25 13:51:51,694] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:52:34,855] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:53:18,915] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:54:01,939] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:54:45,801] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:55:29,014] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:56:12,746] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:56:55,666] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:57:39,564] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:58:22,535] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.328, 'learning_rate': 8.36791060778608e-06, 'epoch': 1.1} [2023-12-25 13:59:06,223] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 13:59:49,379] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:00:32,347] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:01:15,605] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:01:58,187] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:02:41,777] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:03:25,660] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:04:08,586] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:04:51,481] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:05:34,413] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3328, 'learning_rate': 8.180143983848388e-06, 'epoch': 1.12} [2023-12-25 14:06:16,896] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:07:00,656] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:07:44,137] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:08:27,629] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:09:10,512] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:09:54,195] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:10:39,428] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:11:22,155] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:12:06,587] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:12:50,126] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3288, 'learning_rate': 7.993038676180546e-06, 'epoch': 1.13} [2023-12-25 14:13:33,805] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:14:18,788] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:15:01,257] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:15:45,838] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:16:29,020] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:17:12,029] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:17:55,207] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:18:38,806] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:19:22,943] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:20:06,943] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3327, 'learning_rate': 7.806662676856134e-06, 'epoch': 1.14} [2023-12-25 14:20:49,074] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:21:32,851] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:22:17,493] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:22:59,689] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:23:43,224] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:24:26,570] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:25:08,992] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:25:52,170] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:26:35,510] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:27:18,735] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3421, 'learning_rate': 7.62108371292584e-06, 'epoch': 1.15} [2023-12-25 14:28:01,282] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:28:46,036] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:29:28,683] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:30:12,361] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:30:55,809] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:31:38,800] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:32:21,614] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:33:06,166] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:33:49,077] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:34:31,953] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3462, 'learning_rate': 7.436369221806201e-06, 'epoch': 1.16} [2023-12-25 14:35:15,193] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:35:58,649] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:36:42,004] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:37:25,788] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:38:09,128] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:38:52,425] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:39:36,213] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:40:20,051] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:41:04,391] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:41:48,289] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3302, 'learning_rate': 7.2525863267735405e-06, 'epoch': 1.18} [2023-12-25 14:42:32,098] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:43:15,475] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:43:58,167] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:44:41,396] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:45:25,341] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:46:08,544] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:46:52,136] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:47:35,651] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:48:20,122] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:49:03,558] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3278, 'learning_rate': 7.069801812572117e-06, 'epoch': 1.19} [2023-12-25 14:49:48,543] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:50:31,745] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:51:14,670] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:51:58,487] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:52:41,807] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:53:25,670] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:54:09,113] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:54:51,775] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:55:34,133] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:56:16,853] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3402, 'learning_rate': 6.888082101145222e-06, 'epoch': 1.2} [2023-12-25 14:56:59,939] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:57:43,403] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:58:26,803] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:59:09,974] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 14:59:53,346] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:00:39,106] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:01:22,864] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:02:06,642] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:02:50,142] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:03:32,634] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.344, 'learning_rate': 6.707493227498187e-06, 'epoch': 1.21} [2023-12-25 15:04:15,860] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:04:59,725] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:05:42,338] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:06:25,921] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:07:09,597] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:07:52,460] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:08:35,434] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:09:19,150] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:10:01,910] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:10:46,327] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3349, 'learning_rate': 6.5281008157019425e-06, 'epoch': 1.22} [2023-12-25 15:11:29,995] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:12:12,744] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:12:55,629] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:13:38,757] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:14:22,016] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:15:06,119] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:15:49,985] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:16:33,396] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:17:17,039] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:17:59,897] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3379, 'learning_rate': 6.3499700550459554e-06, 'epoch': 1.24} [2023-12-25 15:18:43,075] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:19:26,574] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:20:10,672] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:20:54,349] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:21:39,220] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:22:22,617] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:23:06,425] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:23:49,303] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:24:32,253] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:25:15,856] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3398, 'learning_rate': 6.173165676349103e-06, 'epoch': 1.25} [2023-12-25 15:25:59,851] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:26:42,551] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:27:25,993] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:28:09,281] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:28:52,143] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:29:35,802] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:30:18,886] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:31:01,865] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:31:46,483] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:32:29,383] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3362, 'learning_rate': 5.99775192843722e-06, 'epoch': 1.26} [2023-12-25 15:33:12,082] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:33:55,882] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:34:38,409] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:35:20,999] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:36:04,001] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:36:47,758] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:37:30,870] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:38:14,242] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:38:58,380] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:39:41,665] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3316, 'learning_rate': 5.823792554795738e-06, 'epoch': 1.27} [2023-12-25 15:40:25,489] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:41:08,736] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:41:52,403] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:42:35,926] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:43:18,666] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:44:02,060] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:44:47,636] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:45:30,447] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:46:14,046] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:46:56,957] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3406, 'learning_rate': 5.6513507704059835e-06, 'epoch': 1.29} [2023-12-25 15:47:40,089] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:48:23,434] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:49:05,971] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:49:48,555] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:50:32,314] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:51:15,847] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:51:58,596] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:52:41,940] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:53:25,318] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:54:08,443] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3266, 'learning_rate': 5.480489238773536e-06, 'epoch': 1.3} [2023-12-25 15:54:51,644] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:55:34,863] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:56:18,792] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:57:01,707] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:57:46,036] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:58:29,590] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:59:12,197] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 15:59:55,513] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:00:38,427] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:01:21,776] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3393, 'learning_rate': 5.311270049156967e-06, 'epoch': 1.31} [2023-12-25 16:02:03,968] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:02:49,823] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:03:33,122] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:04:15,799] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:04:58,834] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:05:42,236] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:06:25,591] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:07:09,487] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:07:52,539] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:08:35,230] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3286, 'learning_rate': 5.14375469400529e-06, 'epoch': 1.32} [2023-12-25 16:09:18,521] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:10:01,236] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:10:46,299] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:11:28,743] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:12:11,468] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:12:54,738] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:13:38,160] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:14:21,178] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:15:05,204] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:15:48,494] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3338, 'learning_rate': 4.978004046612224e-06, 'epoch': 1.33} [2023-12-25 16:16:32,287] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:17:15,608] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:17:59,217] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:18:43,352] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:19:26,435] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:20:09,873] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:20:53,469] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:21:36,566] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:22:22,115] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:23:07,567] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3247, 'learning_rate': 4.814078338995516e-06, 'epoch': 1.35} [2023-12-25 16:23:53,339] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:24:40,636] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:25:27,063] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:26:13,452] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:26:58,519] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:27:44,626] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:28:31,186] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:29:16,354] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:30:06,415] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:30:54,832] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3184, 'learning_rate': 4.652037140009259e-06, 'epoch': 1.36} [2023-12-25 16:31:44,451] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:32:33,878] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:33:22,397] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:34:11,587] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:35:00,682] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:35:47,494] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:36:37,170] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:37:27,229] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:38:14,504] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:39:03,484] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3297, 'learning_rate': 4.491939333697205e-06, 'epoch': 1.37} [2023-12-25 16:39:53,213] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:40:40,886] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:41:29,828] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:42:20,240] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5797:6826 [3] NCCL INFO [Service thread] Connection closed by localRank -1 [2023-12-25 16:43:08,864] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:43:57,412] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:44:46,335] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:45:34,622] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:46:24,791] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6693 [4] NCCL INFO [Service thread] Connection closed by localRank -1 [2023-12-25 16:47:13,817] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3378, 'learning_rate': 4.333843097894932e-06, 'epoch': 1.38} [2023-12-25 16:48:02,756] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:48:54,129] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:49:39,411] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:50:24,263] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:51:08,718] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:51:54,000] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:52:40,566] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:53:25,338] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:54:09,640] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:54:53,282] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3426, 'learning_rate': 4.177805883088641e-06, 'epoch': 1.39} [2023-12-25 16:55:38,990] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:56:24,337] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:57:09,833] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5798:6824 [4] NCCL INFO [Service thread] Connection closed by localRank -1 [2023-12-25 16:57:54,747] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:58:39,624] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 16:59:24,437] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:00:08,362] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:00:53,745] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:01:39,480] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:02:24,287] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3354, 'learning_rate': 4.023884391538244e-06, 'epoch': 1.41} vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5799:6825 [5] NCCL INFO [Service thread] Connection closed by localRank -1 [2023-12-25 17:03:09,647] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:03:54,788] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:04:40,402] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:05:26,220] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:06:11,486] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:06:56,656] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:07:42,026] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:08:27,230] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:09:12,458] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:09:58,141] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3348, 'learning_rate': 3.8721345566724156e-06, 'epoch': 1.42} [2023-12-25 17:10:45,295] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 721748225 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1886872682 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1347896394 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 1868983913 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 16843542 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] proxy.cc:1111 NCCL WARN [Service thread] Unknown command 542393671 from localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] NCCL INFO [Service thread] Connection closed by localRank -1 vc-816648091-20231223-a1d29e4b-worker-0:5795:6822 [1] NCCL INFO [Service thread] Connection closed by localRank -1 [2023-12-25 17:11:31,489] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:12:20,963] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:13:07,862] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:13:54,373] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:14:40,987] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:15:27,735] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:16:13,373] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:17:00,301] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:17:47,685] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3314, 'learning_rate': 3.722611522762917e-06, 'epoch': 1.43} [2023-12-25 17:18:34,256] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:19:20,577] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:20:07,198] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:20:53,734] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:21:41,644] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:22:27,508] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:23:14,214] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:24:01,115] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:24:47,099] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:25:33,968] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3358, 'learning_rate': 3.575369624885803e-06, 'epoch': 1.44} [2023-12-25 17:26:20,900] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:27:08,521] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:27:53,990] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:28:41,384] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:29:27,967] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:30:14,257] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:31:00,630] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:31:47,148] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:32:34,082] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:33:20,285] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3258, 'learning_rate': 3.4304623691766193e-06, 'epoch': 1.46} [2023-12-25 17:34:06,541] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:34:52,657] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:35:41,659] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:36:29,262] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:37:16,249] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:38:03,069] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:38:50,209] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:39:36,649] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:40:22,752] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:41:08,003] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3296, 'learning_rate': 3.287942413386841e-06, 'epoch': 1.47} [2023-12-25 17:41:53,472] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:42:40,174] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:43:26,651] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:44:12,472] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:44:58,505] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:45:45,099] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:46:30,931] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:47:15,875] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:48:01,226] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:48:46,404] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3364, 'learning_rate': 3.147861547748612e-06, 'epoch': 1.48} [2023-12-25 17:49:31,228] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:50:17,156] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:51:01,848] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:51:48,169] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:52:33,916] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:53:18,751] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:54:04,305] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:54:50,819] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:55:35,608] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:56:19,810] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3206, 'learning_rate': 3.0102706761547264e-06, 'epoch': 1.49} [2023-12-25 17:57:06,071] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:57:51,414] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:58:36,941] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 17:59:22,149] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:00:07,865] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:00:53,471] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:01:39,938] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:02:26,351] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:03:11,804] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:03:57,057] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3214, 'learning_rate': 2.875219797660681e-06, 'epoch': 1.5} [2023-12-25 18:04:42,506] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:05:27,454] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:06:12,473] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:06:57,913] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:07:42,429] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:08:26,804] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:09:12,143] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:09:58,203] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:10:44,033] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:11:29,720] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3307, 'learning_rate': 2.7427579883155895e-06, 'epoch': 1.52} [2023-12-25 18:12:15,160] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:12:59,236] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:13:43,565] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:14:27,959] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:15:13,542] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:15:58,873] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:16:44,396] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:17:28,789] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:18:12,919] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:18:57,534] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.33, 'learning_rate': 2.612933383328432e-06, 'epoch': 1.53} [2023-12-25 18:19:42,034] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:20:27,491] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:21:12,371] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:21:57,517] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:22:41,877] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:23:27,279] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:24:12,559] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:24:56,777] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:25:41,745] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:26:27,173] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3343, 'learning_rate': 2.4857931595762406e-06, 'epoch': 1.54} [2023-12-25 18:27:11,710] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:27:57,768] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:28:41,460] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:29:24,448] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:30:08,877] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:30:53,950] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:31:39,693] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:32:25,409] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:33:10,810] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:33:55,375] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3258, 'learning_rate': 2.3613835184605527e-06, 'epoch': 1.55} [2023-12-25 18:34:40,012] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:35:25,251] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:36:09,768] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:36:55,127] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:37:40,554] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:38:26,167] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:39:10,810] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:39:55,638] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:40:41,006] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:41:25,548] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3277, 'learning_rate': 2.239749669118272e-06, 'epoch': 1.56} [2023-12-25 18:42:10,451] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:42:56,144] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:43:41,094] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:44:25,795] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:45:10,748] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:45:55,963] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:46:40,211] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:47:26,245] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:48:12,141] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:48:57,283] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3368, 'learning_rate': 2.1209358119931845e-06, 'epoch': 1.58} [2023-12-25 18:49:42,754] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:50:27,631] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:51:12,824] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:51:57,143] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:52:42,117] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:53:28,347] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:54:13,643] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:54:58,719] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:55:42,769] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:56:27,817] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3297, 'learning_rate': 2.0049851227739746e-06, 'epoch': 1.59} [2023-12-25 18:57:12,583] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:57:57,659] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:58:42,390] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 18:59:27,832] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:00:13,070] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:00:57,725] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:01:41,145] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:02:26,676] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:03:11,700] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:03:56,458] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3367, 'learning_rate': 1.891939736704641e-06, 'epoch': 1.6} [2023-12-25 19:04:41,056] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:05:26,425] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:06:11,617] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:06:56,720] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:07:41,222] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:08:27,409] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:09:12,240] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:09:57,346] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:10:42,222] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:11:27,407] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3344, 'learning_rate': 1.7818407332729914e-06, 'epoch': 1.61} [2023-12-25 19:12:11,459] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:12:55,839] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:13:40,983] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:14:25,559] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:15:10,712] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:15:56,777] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:16:41,691] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:17:26,435] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:18:11,294] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:18:55,184] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3388, 'learning_rate': 1.6747281212828193e-06, 'epoch': 1.63} [2023-12-25 19:19:39,335] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:20:24,889] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:21:10,069] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:21:55,532] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:22:40,884] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:23:25,964] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:24:10,451] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:24:55,252] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:25:40,515] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:26:25,389] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3333, 'learning_rate': 1.570640824315095e-06, 'epoch': 1.64} [2023-12-25 19:27:10,604] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:27:55,618] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:28:40,904] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:29:25,131] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:30:10,618] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:30:55,789] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:31:40,443] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:32:25,788] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:33:10,489] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:33:56,209] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3313, 'learning_rate': 1.4696166665835853e-06, 'epoch': 1.65} [2023-12-25 19:34:40,534] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:35:25,000] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:36:09,581] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:36:54,815] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:37:40,342] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:38:25,388] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:39:10,374] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:39:54,935] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:40:40,104] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:41:24,656] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3355, 'learning_rate': 1.3716923591899167e-06, 'epoch': 1.66} [2023-12-25 19:42:09,190] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:42:54,283] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:43:40,041] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:44:24,961] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:45:10,208] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:45:54,558] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:46:40,840] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:47:27,307] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:48:12,337] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:48:56,923] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3287, 'learning_rate': 1.2769034867831588e-06, 'epoch': 1.67} [2023-12-25 19:49:42,231] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:50:27,444] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:51:12,561] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:51:57,659] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:52:42,844] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:53:28,622] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:54:13,942] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:54:58,116] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:55:42,365] [WARNING] [stage3.py:1898:step] 5 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:56:27,221] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3294, 'learning_rate': 1.1852844946287434e-06, 'epoch': 1.69} [2023-12-25 19:57:11,900] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:57:57,129] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:58:42,903] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 19:59:27,797] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:00:12,288] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:00:57,140] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:01:41,943] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:02:28,262] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:03:13,149] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:03:57,872] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3369, 'learning_rate': 1.0968686760914248e-06, 'epoch': 1.7} [2023-12-25 20:04:42,474] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:05:28,220] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:06:12,672] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:06:57,975] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:07:43,212] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:08:29,108] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:09:14,869] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:09:59,745] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:10:43,661] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:11:28,702] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3289, 'learning_rate': 1.0116881605368112e-06, 'epoch': 1.71} [2023-12-25 20:12:13,101] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:12:58,135] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:13:41,956] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:14:27,439] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:15:11,996] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:15:56,155] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:16:41,786] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:17:26,455] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:18:11,584] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:18:57,163] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3242, 'learning_rate': 9.297739016559226e-07, 'epoch': 1.72} [2023-12-25 20:19:41,759] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:20:26,745] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:21:11,945] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:21:56,545] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:22:42,985] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:23:28,208] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:24:13,468] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:24:57,528] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:25:42,518] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:26:27,850] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3295, 'learning_rate': 8.511556662169218e-07, 'epoch': 1.73} [2023-12-25 20:27:12,166] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:27:56,537] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:28:41,230] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:29:27,636] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:30:12,380] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:30:56,767] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:31:40,723] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:32:25,832] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:33:10,669] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:33:55,607] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3257, 'learning_rate': 7.758620232482083e-07, 'epoch': 1.75} [2023-12-25 20:34:40,998] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:35:27,102] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:36:11,494] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:36:55,829] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:37:41,751] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:38:26,490] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:39:10,533] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:39:56,635] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:40:41,060] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:41:26,542] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3278, 'learning_rate': 7.039203336567247e-07, 'epoch': 1.76} [2023-12-25 20:42:12,427] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:42:57,354] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:43:42,744] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:44:27,381] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:45:12,201] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:45:55,934] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:46:41,229] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:47:26,683] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:48:12,033] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:48:57,714] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3199, 'learning_rate': 6.353567402853056e-07, 'epoch': 1.77} [2023-12-25 20:49:42,073] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:50:27,295] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:51:12,149] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:51:56,960] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:52:42,331] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:53:28,371] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:54:12,801] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:54:57,478] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:55:41,409] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:56:25,922] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3378, 'learning_rate': 5.701961584126392e-07, 'epoch': 1.78} [2023-12-25 20:57:10,746] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:57:54,947] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:58:40,715] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 20:59:25,431] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:00:10,175] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:00:55,271] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:01:40,977] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:02:26,389] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:03:10,901] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:03:56,719] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3218, 'learning_rate': 5.084622666993244e-07, 'epoch': 1.8} [2023-12-25 21:04:41,935] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:05:26,916] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:06:13,137] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:06:58,749] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:07:42,692] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:08:28,212] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:09:13,477] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:09:58,232] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:10:43,603] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:11:28,674] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.326, 'learning_rate': 4.501774985832974e-07, 'epoch': 1.81} [2023-12-25 21:12:13,242] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:12:56,183] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:13:39,799] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:14:25,215] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:15:09,515] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:15:53,382] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:16:38,413] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:17:23,159] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:18:07,291] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:18:50,363] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3359, 'learning_rate': 3.953630341277603e-07, 'epoch': 1.82} [2023-12-25 21:19:33,527] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:20:18,919] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:21:03,319] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:21:47,231] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:22:31,552] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:23:15,622] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:23:59,541] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:24:43,850] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:25:26,790] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:26:10,272] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3304, 'learning_rate': 3.440387923245714e-07, 'epoch': 1.83} [2023-12-25 21:26:54,190] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:27:39,184] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:28:23,122] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:29:06,655] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:29:50,856] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:30:34,344] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:31:18,982] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:32:02,424] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:32:47,465] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:33:30,991] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3239, 'learning_rate': 2.9622342385589256e-07, 'epoch': 1.84} [2023-12-25 21:34:14,464] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:34:58,417] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:35:41,862] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:36:26,018] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:37:10,720] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:37:54,889] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:38:39,626] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:39:23,406] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:40:07,194] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:40:51,371] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.333, 'learning_rate': 2.519343043167399e-07, 'epoch': 1.86} [2023-12-25 21:41:35,429] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:42:19,014] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:43:02,858] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:43:48,066] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:44:32,240] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:45:16,507] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:45:59,785] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:46:42,879] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:47:26,635] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:48:10,009] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.316, 'learning_rate': 2.111875279008657e-07, 'epoch': 1.87} [2023-12-25 21:48:53,767] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:49:39,485] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:50:23,166] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:51:07,055] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:51:51,170] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:52:35,036] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:53:19,158] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:54:02,947] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:54:48,128] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:55:32,767] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3345, 'learning_rate': 1.7399790155230633e-07, 'epoch': 1.88} [2023-12-25 21:56:17,450] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:57:01,624] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:57:45,084] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:58:29,026] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:59:12,801] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 21:59:57,014] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:00:40,807] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:01:25,178] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:02:09,391] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:02:53,901] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3284, 'learning_rate': 1.4037893958469994e-07, 'epoch': 1.89} [2023-12-25 22:03:38,472] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:04:22,462] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:05:06,162] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:05:51,489] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:06:35,183] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:07:18,714] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:08:01,329] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:08:45,585] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:09:29,828] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:10:13,293] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3235, 'learning_rate': 1.1034285877032147e-07, 'epoch': 1.9} [2023-12-25 22:10:58,061] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:11:41,816] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:12:25,659] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:13:11,394] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:13:55,323] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:14:39,881] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:15:23,091] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:16:06,764] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:16:51,450] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:17:36,019] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3304, 'learning_rate': 8.390057390064266e-08, 'epoch': 1.92} [2023-12-25 22:18:20,206] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:19:03,388] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:19:48,517] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:20:32,330] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:21:17,998] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:22:02,306] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:22:47,892] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:23:30,914] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:24:14,906] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:24:58,582] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3197, 'learning_rate': 6.10616938200137e-08, 'epoch': 1.93} [2023-12-25 22:25:43,491] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:26:27,274] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:27:11,722] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:27:55,804] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:28:41,019] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:29:26,055] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:30:10,095] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:30:54,078] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:31:39,434] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:32:22,099] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3394, 'learning_rate': 4.183451793390747e-08, 'epoch': 1.94} [2023-12-25 22:33:06,071] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:33:49,808] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:34:34,406] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:35:18,994] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:36:02,654] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:36:47,800] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:37:30,842] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:38:14,467] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:38:57,936] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:39:41,571] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3267, 'learning_rate': 2.6226033193007538e-08, 'epoch': 1.95} [2023-12-25 22:40:25,756] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:41:09,130] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:41:53,646] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:42:38,552] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:43:22,441] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:44:06,693] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:44:50,832] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:45:34,451] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:46:19,769] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:47:03,883] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3329, 'learning_rate': 1.424191155422583e-08, 'epoch': 1.96} [2023-12-25 22:47:48,973] [WARNING] [stage3.py:1898:step] 4 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:48:32,843] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:49:16,768] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:49:59,991] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:50:44,134] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:51:28,100] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:52:12,862] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:52:56,444] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:53:39,937] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:54:23,639] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3307, 'learning_rate': 5.886507919570239e-09, 'epoch': 1.98} [2023-12-25 22:55:08,344] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:55:52,165] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:56:36,316] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:57:20,620] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:58:05,140] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:58:49,144] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 22:59:33,899] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:00:18,850] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:01:02,324] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:01:47,396] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time {'loss': 0.3305, 'learning_rate': 1.1628585536216374e-09, 'epoch': 1.99} [2023-12-25 23:02:31,363] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:03:15,301] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:03:59,391] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:04:43,132] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:05:26,785] [WARNING] [stage3.py:1898:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:06:09,803] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:06:53,683] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:07:37,557] [WARNING] [stage3.py:1898:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time [2023-12-25 23:09:26,221] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1648 is about to be saved! [2023-12-25 23:09:27,918] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/zero_pp_rank_0_mp_rank_00_model_states.pt [2023-12-25 23:09:27,918] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/zero_pp_rank_0_mp_rank_00_model_states.pt... [2023-12-25 23:09:27,942] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/zero_pp_rank_0_mp_rank_00_model_states.pt. [2023-12-25 23:09:27,985] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2023-12-25 23:09:59,171] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2023-12-25 23:09:59,201] [INFO] [engine.py:3285:_save_zero_checkpoint] zero checkpoint saved /group/20025/jiuding/ckpt/34B_final/checkpoint-1648/global_step1648/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2023-12-25 23:10:01,039] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1648 is ready now! {'train_runtime': 72622.3244, 'train_samples_per_second': 5.811, 'train_steps_per_second': 0.023, 'train_loss': 0.40841181026500406, 'epoch': 2.0} ***** train metrics ***** epoch = 2.0 train_loss = 0.4084 train_runtime = 20:10:22.32 train_samples_per_second = 5.811 train_steps_per_second = 0.023 12/25/2023 23:11:57 - INFO - matplotlib.font_manager - generated new fontManager Figure saved: /group/20025/jiuding/ckpt/34B_final/training_loss.svg 12/25/2023 23:11:58 - WARNING - utils.other - No metric eval_loss to plot.