[2022-12-16 19:19:56,902] [WARNING] [runner.py:179:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-12-16 19:19:56,998] [INFO] [runner.py:508:main] cmd = /home/milan/hf_env/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 run_speech_recognition_seq2seq_streaming.py --deepspeed=ds_config.json --model_name_or_path=openai/whisper-large-v2 --dataset_name=mozilla-foundation/common_voice_11_0 --dataset_config_name=cs --language=czech --train_split_name=train+validation --eval_split_name=test --model_index_name=Whisper Large-v2 Czech CV11 v2 --max_steps=5000 --output_dir=./ --per_device_train_batch_size=32 --per_device_eval_batch_size=8 --gradient_accumulation_steps=2 --logging_steps=25 --learning_rate=1e-5 --warmup_steps=500 --evaluation_strategy=steps --eval_steps=1000 --save_strategy=steps --save_steps=1000 --generation_max_length=225 --length_column_name=input_length --max_duration_in_seconds=30 --text_column_name=sentence --freeze_feature_encoder=False --report_to=tensorboard --metric_for_best_model=wer --greater_is_better=False --load_best_model_at_end --gradient_checkpointing --fp16 --overwrite_output_dir --do_train --do_eval --predict_with_generate --do_normalize_eval --streaming=False --use_auth_token --push_to_hub [2022-12-16 19:19:58,537] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2022-12-16 19:19:58,537] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2022-12-16 19:19:58,537] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2022-12-16 19:19:58,537] [INFO] [launch.py:162:main] dist_world_size=1 [2022-12-16 19:19:58,537] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0 [2022-12-16 19:20:02,860] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 12/16/2022 19:20:03 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 12/16/2022 19:20:03 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec16_19-20-02_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/16/2022 19:20:03 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec16_19-20-02_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/16/2022 19:20:05 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:05 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/16/2022 19:20:05 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:05 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/16/2022 19:20:05 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:06 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:06 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/16/2022 19:20:06 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:06 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/16/2022 19:20:06 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:08 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:08 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/16/2022 19:20:08 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:08 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/16/2022 19:20:08 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:27 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-3d5c448b6a2bf0f7.arrow 12/16/2022 19:20:29 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-73e3a5936553e76c.arrow 12/16/2022 19:40:11 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-3470671e4cfe112f.arrow 12/16/2022 19:40:13 - WARNING - huggingface_hub.repository - /home/milan/whisper-large2-czech-cv11-v2/./ is already a clone of https://huggingface.co/mikr/whisper-large2-czech-cv11-v2. Make sure you pull the latest changes with `repo.git_pull()`. [2022-12-16 19:40:17,786] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown [2022-12-16 19:40:18,780] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2022-12-16 19:40:19,982] [WARNING] [cpu_adam.py:83:__init__] FP16 params for CPUAdam may not work on AMD CPUs Installed CUDA version 11.6 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination ninja: no work to do. Time to load cpu_adam op: 3.031318426132202 seconds Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000010, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2022-12-16 19:40:24,909] [INFO] [logging.py:68:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2022-12-16 19:40:25,211] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2022-12-16 19:40:25,212] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type= [2022-12-16 19:40:25,212] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:140:__init__] Reduce bucket size 200000000 [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:141:__init__] Allgather bucket size 200000000 [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:142:__init__] CPU Offload: True [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:143:__init__] Round robin gradient partitioning: False ninja: no work to do. Time to load utils op: 0.5200150012969971 seconds Rank: 0 partition count [1] and sizes[(1543304960, False)] [2022-12-16 19:40:29,582] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states [2022-12-16 19:40:29,583] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-16 19:40:29,583] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 15.46 GB, percent = 7.9% [2022-12-16 19:40:33,634] [INFO] [utils.py:827:see_memory_usage] After initializing optimizer states [2022-12-16 19:40:33,634] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-16 19:40:33,635] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.2 GB, percent = 17.9% [2022-12-16 19:40:33,635] [INFO] [stage_1_and_2.py:525:__init__] optimizer state initialized [2022-12-16 19:40:33,721] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer [2022-12-16 19:40:33,722] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-16 19:40:33,723] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.13 GB, percent = 17.9% [2022-12-16 19:40:33,756] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2022-12-16 19:40:33,756] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR [2022-12-16 19:40:33,757] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-12-16 19:40:33,757] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-16 19:40:33,759] [INFO] [config.py:1020:print] DeepSpeedEngine configuration: [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] amp_enabled .................. False [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] amp_params ................... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] bfloat16_enabled ............. False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] checkpoint_parallel_write_pipeline False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] checkpoint_tag_validation_enabled True [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] checkpoint_tag_validation_fail False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] comms_config ................. [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] communication_data_type ...... None [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] curriculum_enabled ........... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] curriculum_params ............ False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] dataloader_drop_last ......... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] disable_allgather ............ False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] dump_state ................... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_enabled ........... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_gas_boundary_resolution 1 [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_layer_num ......... 0 [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_max_iter .......... 100 [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_stability ......... 1e-06 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] eigenvalue_tol ............... 0.01 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] eigenvalue_verbose ........... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] elasticity_enabled ........... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] fp16_auto_cast ............... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] fp16_enabled ................. True [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] fp16_master_weights_and_gradients False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] global_rank .................. 0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] grad_accum_dtype ............. None [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] gradient_accumulation_steps .. 2 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] gradient_clipping ............ 1.0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] gradient_predivide_factor .... 1.0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] initial_dynamic_scale ........ 65536 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] load_universal_checkpoint .... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] loss_scale ................... 0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] memory_breakdown ............. False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] monitor_config ............... [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] optimizer_legacy_fusion ...... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] optimizer_name ............... adamw [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] optimizer_params ............. {'lr': 1e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0} [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] pld_enabled .................. False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] pld_params ................... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] prescale_gradients ........... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] scheduler_name ............... WarmupDecayLR [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] scheduler_params ............. {'last_batch_iteration': -1, 'total_num_steps': 5000, 'warmup_min_lr': 0, 'warmup_max_lr': 1e-05, 'warmup_num_steps': 500} [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] sparse_attention ............. None [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] sparse_gradients_enabled ..... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] steps_per_print .............. 10 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] train_batch_size ............. 64 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] train_micro_batch_size_per_gpu 32 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] use_node_local_storage ....... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] wall_clock_breakdown ......... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] world_size ................... 1 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_allow_untested_optimizer False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_optimization_stage ...... 2 [2022-12-16 19:40:33,763] [INFO] [config.py:1009:print_user_config] json = { "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": 1e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.0 } }, "scheduler": { "type": "WarmupDecayLR", "params": { "last_batch_iteration": -1, "total_num_steps": 5.000000e+03, "warmup_min_lr": 0, "warmup_max_lr": 1e-05, "warmup_num_steps": 500 } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2.000000e+08, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 2, "gradient_clipping": 1.0, "train_batch_size": 64, "train_micro_batch_size_per_gpu": 32 } Time to load utils op: 0.0003948211669921875 seconds [2022-12-16 19:40:58,606] [INFO] [timer.py:197:stop] 0/4, RunningAvgSamplesPerSec=6.327062880977527, CurrSamplesPerSec=5.683973434872449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:09,925] [INFO] [timer.py:197:stop] 0/6, RunningAvgSamplesPerSec=6.337890979134199, CurrSamplesPerSec=5.698936745189652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:21,308] [INFO] [timer.py:197:stop] 0/8, RunningAvgSamplesPerSec=6.3294469227923305, CurrSamplesPerSec=5.6523551541575205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:33,017] [INFO] [timer.py:197:stop] 0/10, RunningAvgSamplesPerSec=6.328546175322321, CurrSamplesPerSec=5.701343759212486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:44,654] [INFO] [timer.py:197:stop] 0/12, RunningAvgSamplesPerSec=6.330046764141762, CurrSamplesPerSec=5.7140661343466865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:56,029] [INFO] [timer.py:197:stop] 0/14, RunningAvgSamplesPerSec=6.327367592679242, CurrSamplesPerSec=5.687009205382302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:07,620] [INFO] [timer.py:197:stop] 0/16, RunningAvgSamplesPerSec=6.324036355417439, CurrSamplesPerSec=5.67106537300076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:19,208] [INFO] [timer.py:197:stop] 0/18, RunningAvgSamplesPerSec=6.324517029843766, CurrSamplesPerSec=5.686187866744037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:30,000] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 65536 [2022-12-16 19:42:30,002] [INFO] [logging.py:68:log_dist] [Rank 0] step=10, skipped=1, lr=[3.535580269163017e-06], mom=[[0.9, 0.999]] [2022-12-16 19:42:30,003] [INFO] [timer.py:197:stop] 0/20, RunningAvgSamplesPerSec=6.364053760974696, CurrSamplesPerSec=6.352128353972973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:41,491] [INFO] [timer.py:197:stop] 0/22, RunningAvgSamplesPerSec=6.359150016371227, CurrSamplesPerSec=5.681020735272481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:53,045] [INFO] [timer.py:197:stop] 0/24, RunningAvgSamplesPerSec=6.356683117345163, CurrSamplesPerSec=5.686370954837155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:03,984] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768.0 [2022-12-16 19:43:03,986] [INFO] [timer.py:197:stop] 0/26, RunningAvgSamplesPerSec=6.376183713003614, CurrSamplesPerSec=6.175548481452842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:15,357] [INFO] [timer.py:197:stop] 0/28, RunningAvgSamplesPerSec=6.370924228753169, CurrSamplesPerSec=5.667562406103809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:26,742] [INFO] [timer.py:197:stop] 0/30, RunningAvgSamplesPerSec=6.366664670767918, CurrSamplesPerSec=5.69276426047378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:38,322] [INFO] [timer.py:197:stop] 0/32, RunningAvgSamplesPerSec=6.3545668974505904, CurrSamplesPerSec=5.480904775405896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:49,702] [INFO] [timer.py:197:stop] 0/34, RunningAvgSamplesPerSec=6.350895118978619, CurrSamplesPerSec=5.64039048535495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:01,055] [INFO] [timer.py:197:stop] 0/36, RunningAvgSamplesPerSec=6.34923010740825, CurrSamplesPerSec=5.697280637929649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:12,507] [INFO] [timer.py:197:stop] 0/38, RunningAvgSamplesPerSec=6.344325528095929, CurrSamplesPerSec=5.606869186879086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:23,841] [INFO] [logging.py:68:log_dist] [Rank 0] step=20, skipped=2, lr=[4.650931663140581e-06], mom=[[0.9, 0.999]] [2022-12-16 19:44:23,843] [INFO] [timer.py:197:stop] 0/40, RunningAvgSamplesPerSec=6.343791060705328, CurrSamplesPerSec=5.697582226434801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:35,251] [INFO] [timer.py:197:stop] 0/42, RunningAvgSamplesPerSec=6.341041313223634, CurrSamplesPerSec=5.669263060201902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:45,976] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 [2022-12-16 19:44:45,978] [INFO] [timer.py:197:stop] 0/44, RunningAvgSamplesPerSec=6.358073639041419, CurrSamplesPerSec=6.36476204743189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:57,311] [INFO] [timer.py:197:stop] 0/46, RunningAvgSamplesPerSec=6.357102766194718, CurrSamplesPerSec=5.694208770316031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:08,706] [INFO] [timer.py:197:stop] 0/48, RunningAvgSamplesPerSec=6.354161775690528, CurrSamplesPerSec=5.654900942339342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:20,362] [INFO] [timer.py:197:stop] 0/50, RunningAvgSamplesPerSec=6.35051370610784, CurrSamplesPerSec=5.6561494303952085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.3246, 'learning_rate': 4.973833272194737e-06, 'epoch': 0.11} [2022-12-16 19:45:31,689] [INFO] [timer.py:197:stop] 0/52, RunningAvgSamplesPerSec=6.349782726929668, CurrSamplesPerSec=5.685572439917749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:43,248] [INFO] [timer.py:197:stop] 0/54, RunningAvgSamplesPerSec=6.34880004266226, CurrSamplesPerSec=5.6963738880964865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:54,899] [INFO] [timer.py:197:stop] 0/56, RunningAvgSamplesPerSec=6.345344332473558, CurrSamplesPerSec=5.643884141232875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:06,245] [INFO] [timer.py:197:stop] 0/58, RunningAvgSamplesPerSec=6.3448005032861134, CurrSamplesPerSec=5.690518647702642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:17,576] [INFO] [logging.py:68:log_dist] [Rank 0] step=30, skipped=3, lr=[5.303370403744525e-06], mom=[[0.9, 0.999]] [2022-12-16 19:46:17,578] [INFO] [timer.py:197:stop] 0/60, RunningAvgSamplesPerSec=6.344274046935118, CurrSamplesPerSec=5.7001045334137865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:29,195] [INFO] [timer.py:197:stop] 0/62, RunningAvgSamplesPerSec=6.342096151339661, CurrSamplesPerSec=5.669778676768185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:40,581] [INFO] [timer.py:197:stop] 0/64, RunningAvgSamplesPerSec=6.340969890059656, CurrSamplesPerSec=5.679663422407637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:51,901] [INFO] [timer.py:197:stop] 0/66, RunningAvgSamplesPerSec=6.340862832858599, CurrSamplesPerSec=5.708144139586262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:03,352] [INFO] [timer.py:197:stop] 0/68, RunningAvgSamplesPerSec=6.3395299478138005, CurrSamplesPerSec=5.699751121093182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:14,725] [INFO] [timer.py:197:stop] 0/70, RunningAvgSamplesPerSec=6.338858901572622, CurrSamplesPerSec=5.6697978375746505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:26,090] [INFO] [timer.py:197:stop] 0/72, RunningAvgSamplesPerSec=6.338275884730358, CurrSamplesPerSec=5.6821056575417135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:37,502] [INFO] [timer.py:197:stop] 0/74, RunningAvgSamplesPerSec=6.336656821130215, CurrSamplesPerSec=5.693186355412951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:48,841] [INFO] [timer.py:197:stop] 0/76, RunningAvgSamplesPerSec=6.336628677865746, CurrSamplesPerSec=5.704229867270073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:00,193] [INFO] [timer.py:197:stop] 0/78, RunningAvgSamplesPerSec=6.336391619354949, CurrSamplesPerSec=5.6949493023399285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:11,884] [INFO] [logging.py:68:log_dist] [Rank 0] step=40, skipped=3, lr=[5.810371073215365e-06], mom=[[0.9, 0.999]] [2022-12-16 19:48:11,886] [INFO] [timer.py:197:stop] 0/80, RunningAvgSamplesPerSec=6.336877312300462, CurrSamplesPerSec=5.711746323017021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:23,476] [INFO] [timer.py:197:stop] 0/82, RunningAvgSamplesPerSec=6.336437280416101, CurrSamplesPerSec=5.700073789642822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:34,945] [INFO] [timer.py:197:stop] 0/84, RunningAvgSamplesPerSec=6.334408253614145, CurrSamplesPerSec=5.564071227950201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:46,254] [INFO] [timer.py:197:stop] 0/86, RunningAvgSamplesPerSec=6.33464567786091, CurrSamplesPerSec=5.702617444594408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:57,589] [INFO] [timer.py:197:stop] 0/88, RunningAvgSamplesPerSec=6.334718218022051, CurrSamplesPerSec=5.6997610450876905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:09,242] [INFO] [timer.py:197:stop] 0/90, RunningAvgSamplesPerSec=6.330253477273904, CurrSamplesPerSec=5.385235114108773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:20,553] [INFO] [timer.py:197:stop] 0/92, RunningAvgSamplesPerSec=6.330733283816013, CurrSamplesPerSec=5.722714157612833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:31,916] [INFO] [timer.py:197:stop] 0/94, RunningAvgSamplesPerSec=6.330690353473429, CurrSamplesPerSec=5.702511080508184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:43,602] [INFO] [timer.py:197:stop] 0/96, RunningAvgSamplesPerSec=6.330121487925524, CurrSamplesPerSec=5.651281561996539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:54,961] [INFO] [timer.py:197:stop] 0/98, RunningAvgSamplesPerSec=6.3297297307440985, CurrSamplesPerSec=5.654540248857908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:06,563] [INFO] [logging.py:68:log_dist] [Rank 0] step=50, skipped=3, lr=[6.195318418690893e-06], mom=[[0.9, 0.999]] [2022-12-16 19:50:06,565] [INFO] [timer.py:197:stop] 0/100, RunningAvgSamplesPerSec=6.329719324883379, CurrSamplesPerSec=5.702907240231407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1691, 'learning_rate': 6.195318418690893e-06, 'epoch': 0.21} [2022-12-16 19:50:18,089] [INFO] [timer.py:197:stop] 0/102, RunningAvgSamplesPerSec=6.328697909114363, CurrSamplesPerSec=5.685371341049817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:29,422] [INFO] [timer.py:197:stop] 0/104, RunningAvgSamplesPerSec=6.329127259216088, CurrSamplesPerSec=5.715023301398578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:40,909] [INFO] [timer.py:197:stop] 0/106, RunningAvgSamplesPerSec=6.329489892401154, CurrSamplesPerSec=5.732910611937274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:52,391] [INFO] [timer.py:197:stop] 0/108, RunningAvgSamplesPerSec=6.3291082887675385, CurrSamplesPerSec=5.713503516490646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:03,672] [INFO] [timer.py:197:stop] 0/110, RunningAvgSamplesPerSec=6.329714179875415, CurrSamplesPerSec=5.717954946591897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:15,047] [INFO] [timer.py:197:stop] 0/112, RunningAvgSamplesPerSec=6.330206129328904, CurrSamplesPerSec=5.731479198460716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:26,433] [INFO] [timer.py:197:stop] 0/114, RunningAvgSamplesPerSec=6.330185801461701, CurrSamplesPerSec=5.697561668064762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:37,747] [INFO] [timer.py:197:stop] 0/116, RunningAvgSamplesPerSec=6.3305736060670466, CurrSamplesPerSec=5.721073711370483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:49,305] [INFO] [timer.py:197:stop] 0/118, RunningAvgSamplesPerSec=6.330816278593838, CurrSamplesPerSec=5.722915222713305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:00,804] [INFO] [logging.py:68:log_dist] [Rank 0] step=60, skipped=3, lr=[6.505722008216461e-06], mom=[[0.9, 0.999]] [2022-12-16 19:52:00,806] [INFO] [timer.py:197:stop] 0/120, RunningAvgSamplesPerSec=6.330232978736771, CurrSamplesPerSec=5.6565146202221674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:12,174] [INFO] [timer.py:197:stop] 0/122, RunningAvgSamplesPerSec=6.330056877233613, CurrSamplesPerSec=5.686394323508017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:23,596] [INFO] [timer.py:197:stop] 0/124, RunningAvgSamplesPerSec=6.330396119925766, CurrSamplesPerSec=5.719701094321522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:35,126] [INFO] [timer.py:197:stop] 0/126, RunningAvgSamplesPerSec=6.330297511420275, CurrSamplesPerSec=5.711849385744503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:46,490] [INFO] [timer.py:197:stop] 0/128, RunningAvgSamplesPerSec=6.330124817932588, CurrSamplesPerSec=5.687188249668486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:57,974] [INFO] [timer.py:197:stop] 0/130, RunningAvgSamplesPerSec=6.33039768866297, CurrSamplesPerSec=5.708138070552879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:09,484] [INFO] [timer.py:197:stop] 0/132, RunningAvgSamplesPerSec=6.330326619642486, CurrSamplesPerSec=5.711341157262751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:20,796] [INFO] [timer.py:197:stop] 0/134, RunningAvgSamplesPerSec=6.3306785190922925, CurrSamplesPerSec=5.725297601502475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:32,117] [INFO] [timer.py:197:stop] 0/136, RunningAvgSamplesPerSec=6.330777395173404, CurrSamplesPerSec=5.700345895641132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:43,732] [INFO] [timer.py:197:stop] 0/138, RunningAvgSamplesPerSec=6.328317929913682, CurrSamplesPerSec=5.704530009876204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:55,069] [INFO] [logging.py:68:log_dist] [Rank 0] step=70, skipped=3, lr=[6.765821034569313e-06], mom=[[0.9, 0.999]] [2022-12-16 19:53:55,070] [INFO] [timer.py:197:stop] 0/140, RunningAvgSamplesPerSec=6.3281437768124755, CurrSamplesPerSec=5.6646511134992545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:06,392] [INFO] [timer.py:197:stop] 0/142, RunningAvgSamplesPerSec=6.328266137695921, CurrSamplesPerSec=5.723089458024658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:17,774] [INFO] [timer.py:197:stop] 0/144, RunningAvgSamplesPerSec=6.327982292630433, CurrSamplesPerSec=5.701898171177177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:29,098] [INFO] [timer.py:197:stop] 0/146, RunningAvgSamplesPerSec=6.3282000633196125, CurrSamplesPerSec=5.7053862444388335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:40,438] [INFO] [timer.py:197:stop] 0/148, RunningAvgSamplesPerSec=6.328155482147814, CurrSamplesPerSec=5.673894035889023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:52,053] [INFO] [timer.py:197:stop] 0/150, RunningAvgSamplesPerSec=6.328496355103054, CurrSamplesPerSec=5.7220768941292555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1644, 'learning_rate': 6.881634451095711e-06, 'epoch': 0.32} [2022-12-16 19:55:03,365] [INFO] [timer.py:197:stop] 0/152, RunningAvgSamplesPerSec=6.328569418735792, CurrSamplesPerSec=5.68189566352516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:14,855] [INFO] [timer.py:197:stop] 0/154, RunningAvgSamplesPerSec=6.3275040965459715, CurrSamplesPerSec=5.571994189949849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:26,452] [INFO] [timer.py:197:stop] 0/156, RunningAvgSamplesPerSec=6.327817724665443, CurrSamplesPerSec=5.721640992027081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:37,782] [INFO] [timer.py:197:stop] 0/158, RunningAvgSamplesPerSec=6.327978159826033, CurrSamplesPerSec=5.712968001389319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:49,157] [INFO] [logging.py:68:log_dist] [Rank 0] step=80, skipped=3, lr=[6.9896691039239e-06], mom=[[0.9, 0.999]] [2022-12-16 19:55:49,159] [INFO] [timer.py:197:stop] 0/160, RunningAvgSamplesPerSec=6.328182428410735, CurrSamplesPerSec=5.717271739216092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:00,611] [INFO] [timer.py:197:stop] 0/162, RunningAvgSamplesPerSec=6.327661285455147, CurrSamplesPerSec=5.696292899154895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:11,980] [INFO] [timer.py:197:stop] 0/164, RunningAvgSamplesPerSec=6.327531151891573, CurrSamplesPerSec=5.687551433336378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:23,313] [INFO] [timer.py:197:stop] 0/166, RunningAvgSamplesPerSec=6.327629144751145, CurrSamplesPerSec=5.690251580192909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:34,645] [INFO] [timer.py:197:stop] 0/168, RunningAvgSamplesPerSec=6.327745174849396, CurrSamplesPerSec=5.682591613865132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:45,962] [INFO] [timer.py:197:stop] 0/170, RunningAvgSamplesPerSec=6.3278631523075575, CurrSamplesPerSec=5.70634268898286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:57,300] [INFO] [timer.py:197:stop] 0/172, RunningAvgSamplesPerSec=6.327832359726225, CurrSamplesPerSec=5.692901410316983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:08,611] [INFO] [timer.py:197:stop] 0/174, RunningAvgSamplesPerSec=6.327975893412131, CurrSamplesPerSec=5.691621682502721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:19,938] [INFO] [timer.py:197:stop] 0/176, RunningAvgSamplesPerSec=6.328006607076309, CurrSamplesPerSec=5.674112793988129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:31,335] [INFO] [timer.py:197:stop] 0/178, RunningAvgSamplesPerSec=6.327706582184348, CurrSamplesPerSec=5.648043028531768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:42,679] [INFO] [logging.py:68:log_dist] [Rank 0] step=90, skipped=3, lr=[7.186146009413563e-06], mom=[[0.9, 0.999]] [2022-12-16 19:57:42,680] [INFO] [timer.py:197:stop] 0/180, RunningAvgSamplesPerSec=6.327737545730115, CurrSamplesPerSec=5.679434382740355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:54,053] [INFO] [timer.py:197:stop] 0/182, RunningAvgSamplesPerSec=6.327567772018162, CurrSamplesPerSec=5.671728233835681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:05,553] [INFO] [timer.py:197:stop] 0/184, RunningAvgSamplesPerSec=6.327321171217064, CurrSamplesPerSec=5.650057103503205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:17,043] [INFO] [timer.py:197:stop] 0/186, RunningAvgSamplesPerSec=6.327257340860697, CurrSamplesPerSec=5.696403866676768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:28,641] [INFO] [timer.py:197:stop] 0/188, RunningAvgSamplesPerSec=6.325485038741021, CurrSamplesPerSec=5.448433209976314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:40,184] [INFO] [timer.py:197:stop] 0/190, RunningAvgSamplesPerSec=6.325371332058678, CurrSamplesPerSec=5.650139399406368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:51,535] [INFO] [timer.py:197:stop] 0/192, RunningAvgSamplesPerSec=6.32535828915325, CurrSamplesPerSec=5.693068268772343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:03,080] [INFO] [timer.py:197:stop] 0/194, RunningAvgSamplesPerSec=6.324000040671213, CurrSamplesPerSec=5.484969185042463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:14,449] [INFO] [timer.py:197:stop] 0/196, RunningAvgSamplesPerSec=6.323911524635286, CurrSamplesPerSec=5.656116060308147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:25,865] [INFO] [timer.py:197:stop] 0/198, RunningAvgSamplesPerSec=6.32353250731071, CurrSamplesPerSec=5.666084499205607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:37,404] [INFO] [logging.py:68:log_dist] [Rank 0] step=100, skipped=3, lr=[7.361221988663844e-06], mom=[[0.9, 0.999]] [2022-12-16 19:59:37,406] [INFO] [timer.py:197:stop] 0/200, RunningAvgSamplesPerSec=6.3223735336504, CurrSamplesPerSec=5.500500901581866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1458, 'learning_rate': 7.361221988663844e-06, 'epoch': 0.42} [2022-12-16 19:59:48,750] [INFO] [timer.py:197:stop] 0/202, RunningAvgSamplesPerSec=6.322279298198284, CurrSamplesPerSec=5.673023008677945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:00,166] [INFO] [timer.py:197:stop] 0/204, RunningAvgSamplesPerSec=6.322269734290734, CurrSamplesPerSec=5.667173773723511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:11,649] [INFO] [timer.py:197:stop] 0/206, RunningAvgSamplesPerSec=6.321527490293199, CurrSamplesPerSec=5.603557346378851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:23,040] [INFO] [timer.py:197:stop] 0/208, RunningAvgSamplesPerSec=6.321413257018045, CurrSamplesPerSec=5.6625088443919855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:34,546] [INFO] [timer.py:197:stop] 0/210, RunningAvgSamplesPerSec=6.32141400676977, CurrSamplesPerSec=5.6849306591484146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:46,129] [INFO] [timer.py:197:stop] 0/212, RunningAvgSamplesPerSec=6.321367640712001, CurrSamplesPerSec=5.6768530161234745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:57,500] [INFO] [timer.py:197:stop] 0/214, RunningAvgSamplesPerSec=6.321339276033958, CurrSamplesPerSec=5.702498239554356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:08,843] [INFO] [timer.py:197:stop] 0/216, RunningAvgSamplesPerSec=6.321463085232523, CurrSamplesPerSec=5.712604482269962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:20,279] [INFO] [timer.py:197:stop] 0/218, RunningAvgSamplesPerSec=6.321074042641444, CurrSamplesPerSec=5.642485922448768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:31,627] [INFO] [logging.py:68:log_dist] [Rank 0] step=110, skipped=3, lr=[7.5191046007362515e-06], mom=[[0.9, 0.999]] [2022-12-16 20:01:31,629] [INFO] [timer.py:197:stop] 0/220, RunningAvgSamplesPerSec=6.321164389806482, CurrSamplesPerSec=5.698712681387359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:43,012] [INFO] [timer.py:197:stop] 0/222, RunningAvgSamplesPerSec=6.321083383820289, CurrSamplesPerSec=5.675591772493804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:54,342] [INFO] [timer.py:197:stop] 0/224, RunningAvgSamplesPerSec=6.321181478696793, CurrSamplesPerSec=5.701490768394447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:05,720] [INFO] [timer.py:197:stop] 0/226, RunningAvgSamplesPerSec=6.321103908583707, CurrSamplesPerSec=5.6887073263966546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:17,052] [INFO] [timer.py:197:stop] 0/228, RunningAvgSamplesPerSec=6.321279121862528, CurrSamplesPerSec=5.7162519815248904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:28,413] [INFO] [timer.py:197:stop] 0/230, RunningAvgSamplesPerSec=6.321298670377185, CurrSamplesPerSec=5.709214433335775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:39,723] [INFO] [timer.py:197:stop] 0/232, RunningAvgSamplesPerSec=6.321502115012851, CurrSamplesPerSec=5.7279660020557355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:51,084] [INFO] [timer.py:197:stop] 0/234, RunningAvgSamplesPerSec=6.321540937522988, CurrSamplesPerSec=5.69602673978007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:02,438] [INFO] [timer.py:197:stop] 0/236, RunningAvgSamplesPerSec=6.32159137687095, CurrSamplesPerSec=5.691098465896081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:13,796] [INFO] [timer.py:197:stop] 0/238, RunningAvgSamplesPerSec=6.321628403692657, CurrSamplesPerSec=5.69404691784024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:25,137] [INFO] [logging.py:68:log_dist] [Rank 0] step=120, skipped=3, lr=[7.662870867121632e-06], mom=[[0.9, 0.999]] [2022-12-16 20:03:25,138] [INFO] [timer.py:197:stop] 0/240, RunningAvgSamplesPerSec=6.321748647888827, CurrSamplesPerSec=5.696099018567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:36,598] [INFO] [timer.py:197:stop] 0/242, RunningAvgSamplesPerSec=6.321880011749018, CurrSamplesPerSec=5.714113571526187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:47,936] [INFO] [timer.py:197:stop] 0/244, RunningAvgSamplesPerSec=6.322009363444331, CurrSamplesPerSec=5.6904749791235485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:59,258] [INFO] [timer.py:197:stop] 0/246, RunningAvgSamplesPerSec=6.322221643460198, CurrSamplesPerSec=5.7263758040612815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:10,613] [INFO] [timer.py:197:stop] 0/248, RunningAvgSamplesPerSec=6.322263624452999, CurrSamplesPerSec=5.692347539349729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:21,955] [INFO] [timer.py:197:stop] 0/250, RunningAvgSamplesPerSec=6.3222029662188115, CurrSamplesPerSec=5.67385973653264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1389, 'learning_rate': 7.730207550743121e-06, 'epoch': 0.53} [2022-12-16 20:04:33,296] [INFO] [timer.py:197:stop] 0/252, RunningAvgSamplesPerSec=6.322313955883044, CurrSamplesPerSec=5.701953642527591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:44,678] [INFO] [timer.py:197:stop] 0/254, RunningAvgSamplesPerSec=6.322151144538406, CurrSamplesPerSec=5.665580077082235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:56,010] [INFO] [timer.py:197:stop] 0/256, RunningAvgSamplesPerSec=6.322301777645738, CurrSamplesPerSec=5.718121328046148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:07,595] [INFO] [timer.py:197:stop] 0/258, RunningAvgSamplesPerSec=6.322330489705541, CurrSamplesPerSec=5.707673706391819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:18,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=130, skipped=3, lr=[7.794839207460995e-06], mom=[[0.9, 0.999]] [2022-12-16 20:05:18,935] [INFO] [timer.py:197:stop] 0/260, RunningAvgSamplesPerSec=6.322448219599348, CurrSamplesPerSec=5.706156857008184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:30,262] [INFO] [timer.py:197:stop] 0/262, RunningAvgSamplesPerSec=6.322617585015774, CurrSamplesPerSec=5.711549930757668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:41,626] [INFO] [timer.py:197:stop] 0/264, RunningAvgSamplesPerSec=6.322613341576608, CurrSamplesPerSec=5.687131619368439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:53,023] [INFO] [timer.py:197:stop] 0/266, RunningAvgSamplesPerSec=6.322451874197788, CurrSamplesPerSec=5.676991081195374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:04,411] [INFO] [timer.py:197:stop] 0/268, RunningAvgSamplesPerSec=6.322351880434625, CurrSamplesPerSec=5.67797814191006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:15,738] [INFO] [timer.py:197:stop] 0/270, RunningAvgSamplesPerSec=6.322369845515796, CurrSamplesPerSec=5.689833538393228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:27,100] [INFO] [timer.py:197:stop] 0/272, RunningAvgSamplesPerSec=6.322374139413545, CurrSamplesPerSec=5.697104101334954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:38,457] [INFO] [timer.py:197:stop] 0/274, RunningAvgSamplesPerSec=6.322333531347227, CurrSamplesPerSec=5.704905353797042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:49,837] [INFO] [timer.py:197:stop] 0/276, RunningAvgSamplesPerSec=6.3221861198507225, CurrSamplesPerSec=5.6565828005700505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:01,155] [INFO] [timer.py:197:stop] 0/278, RunningAvgSamplesPerSec=6.32240021581047, CurrSamplesPerSec=5.725513502620633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:12,519] [INFO] [logging.py:68:log_dist] [Rank 0] step=140, skipped=3, lr=[7.916799978227501e-06], mom=[[0.9, 0.999]] [2022-12-16 20:07:12,521] [INFO] [timer.py:197:stop] 0/280, RunningAvgSamplesPerSec=6.322314987769751, CurrSamplesPerSec=5.682835585653388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:23,835] [INFO] [timer.py:197:stop] 0/282, RunningAvgSamplesPerSec=6.322476369079172, CurrSamplesPerSec=5.728346147854608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:35,189] [INFO] [timer.py:197:stop] 0/284, RunningAvgSamplesPerSec=6.322524555667764, CurrSamplesPerSec=5.702553722581063, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:46,547] [INFO] [timer.py:197:stop] 0/286, RunningAvgSamplesPerSec=6.322601907843428, CurrSamplesPerSec=5.701636331907292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:57,869] [INFO] [timer.py:197:stop] 0/288, RunningAvgSamplesPerSec=6.32279292584737, CurrSamplesPerSec=5.722436496272999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:09,214] [INFO] [timer.py:197:stop] 0/290, RunningAvgSamplesPerSec=6.3228057686994665, CurrSamplesPerSec=5.686518156577721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:20,583] [INFO] [timer.py:197:stop] 0/292, RunningAvgSamplesPerSec=6.3227806640395166, CurrSamplesPerSec=5.679599491324056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:31,994] [INFO] [timer.py:197:stop] 0/294, RunningAvgSamplesPerSec=6.322781839290583, CurrSamplesPerSec=5.6987397810401985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:43,359] [INFO] [timer.py:197:stop] 0/296, RunningAvgSamplesPerSec=6.3228034210183734, CurrSamplesPerSec=5.697710417265229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:54,709] [INFO] [timer.py:197:stop] 0/298, RunningAvgSamplesPerSec=6.322863685882392, CurrSamplesPerSec=5.703479406551039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:06,039] [INFO] [logging.py:68:log_dist] [Rank 0] step=150, skipped=3, lr=[8.03016458599496e-06], mom=[[0.9, 0.999]] [2022-12-16 20:09:06,041] [INFO] [timer.py:197:stop] 0/300, RunningAvgSamplesPerSec=6.322928885435723, CurrSamplesPerSec=5.696764601432161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1376, 'learning_rate': 8.03016458599496e-06, 'epoch': 0.64} [2022-12-16 20:09:17,433] [INFO] [timer.py:197:stop] 0/302, RunningAvgSamplesPerSec=6.322817514724475, CurrSamplesPerSec=5.675178281500542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:28,800] [INFO] [timer.py:197:stop] 0/304, RunningAvgSamplesPerSec=6.322797946509351, CurrSamplesPerSec=5.68351227186433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:40,183] [INFO] [timer.py:197:stop] 0/306, RunningAvgSamplesPerSec=6.322896759479059, CurrSamplesPerSec=5.728840535096935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:51,543] [INFO] [timer.py:197:stop] 0/308, RunningAvgSamplesPerSec=6.322910192350933, CurrSamplesPerSec=5.701197726295654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:02,885] [INFO] [timer.py:197:stop] 0/310, RunningAvgSamplesPerSec=6.3230024007557315, CurrSamplesPerSec=5.718039475930382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:14,283] [INFO] [timer.py:197:stop] 0/312, RunningAvgSamplesPerSec=6.322866428944482, CurrSamplesPerSec=5.6628476184268015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:25,595] [INFO] [timer.py:197:stop] 0/314, RunningAvgSamplesPerSec=6.323075360071841, CurrSamplesPerSec=5.708922539388064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:36,913] [INFO] [timer.py:197:stop] 0/316, RunningAvgSamplesPerSec=6.323254930695002, CurrSamplesPerSec=5.7192082831832405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:48,220] [INFO] [timer.py:197:stop] 0/318, RunningAvgSamplesPerSec=6.323405337207393, CurrSamplesPerSec=5.70960253823397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:59,560] [INFO] [logging.py:68:log_dist] [Rank 0] step=160, skipped=3, lr=[8.136065420813943e-06], mom=[[0.9, 0.999]] [2022-12-16 20:10:59,562] [INFO] [timer.py:197:stop] 0/320, RunningAvgSamplesPerSec=6.323477777341223, CurrSamplesPerSec=5.689709560887472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:10,948] [INFO] [timer.py:197:stop] 0/322, RunningAvgSamplesPerSec=6.323460607589859, CurrSamplesPerSec=5.670344454119559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:22,293] [INFO] [timer.py:197:stop] 0/324, RunningAvgSamplesPerSec=6.323463740424428, CurrSamplesPerSec=5.677860204925066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:33,643] [INFO] [timer.py:197:stop] 0/326, RunningAvgSamplesPerSec=6.323565869006063, CurrSamplesPerSec=5.704370236980602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:44,948] [INFO] [timer.py:197:stop] 0/328, RunningAvgSamplesPerSec=6.323660017579624, CurrSamplesPerSec=5.709596951869308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:56,267] [INFO] [timer.py:197:stop] 0/330, RunningAvgSamplesPerSec=6.323823018324491, CurrSamplesPerSec=5.712549046630418, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:07,628] [INFO] [timer.py:197:stop] 0/332, RunningAvgSamplesPerSec=6.323832251532743, CurrSamplesPerSec=5.6960204547548505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:18,964] [INFO] [timer.py:197:stop] 0/334, RunningAvgSamplesPerSec=6.323933543769019, CurrSamplesPerSec=5.692614320759774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:30,318] [INFO] [timer.py:197:stop] 0/336, RunningAvgSamplesPerSec=6.323971450955896, CurrSamplesPerSec=5.704066717376007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:41,675] [INFO] [timer.py:197:stop] 0/338, RunningAvgSamplesPerSec=6.3239844985564035, CurrSamplesPerSec=5.698655579391169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:53,043] [INFO] [logging.py:68:log_dist] [Rank 0] step=170, skipped=3, lr=[8.235424875329062e-06], mom=[[0.9, 0.999]] [2022-12-16 20:12:53,045] [INFO] [timer.py:197:stop] 0/340, RunningAvgSamplesPerSec=6.323948740973426, CurrSamplesPerSec=5.685194337650517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:04,403] [INFO] [timer.py:197:stop] 0/342, RunningAvgSamplesPerSec=6.32396319355309, CurrSamplesPerSec=5.698694534442487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:15,751] [INFO] [timer.py:197:stop] 0/344, RunningAvgSamplesPerSec=6.3240232984687195, CurrSamplesPerSec=5.707178839997753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:27,102] [INFO] [timer.py:197:stop] 0/346, RunningAvgSamplesPerSec=6.324050524078457, CurrSamplesPerSec=5.720586271562472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:38,482] [INFO] [timer.py:197:stop] 0/348, RunningAvgSamplesPerSec=6.323927304234203, CurrSamplesPerSec=5.6728395802686995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:49,894] [INFO] [timer.py:197:stop] 0/350, RunningAvgSamplesPerSec=6.323775163215871, CurrSamplesPerSec=5.6732903798269225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1374, 'learning_rate': 8.282894746203441e-06, 'epoch': 0.74} [2022-12-16 20:14:01,273] [INFO] [timer.py:197:stop] 0/352, RunningAvgSamplesPerSec=6.323744099538408, CurrSamplesPerSec=5.701922878806938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:12,649] [INFO] [timer.py:197:stop] 0/354, RunningAvgSamplesPerSec=6.32363656035121, CurrSamplesPerSec=5.668106198002945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:24,018] [INFO] [timer.py:197:stop] 0/356, RunningAvgSamplesPerSec=6.323612329000168, CurrSamplesPerSec=5.690963815932577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:35,387] [INFO] [timer.py:197:stop] 0/358, RunningAvgSamplesPerSec=6.323592321881347, CurrSamplesPerSec=5.682652484549207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:46,747] [INFO] [logging.py:68:log_dist] [Rank 0] step=180, skipped=3, lr=[8.329004259959669e-06], mom=[[0.9, 0.999]] [2022-12-16 20:14:46,749] [INFO] [timer.py:197:stop] 0/360, RunningAvgSamplesPerSec=6.323480334411763, CurrSamplesPerSec=5.657225111507851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:58,121] [INFO] [timer.py:197:stop] 0/362, RunningAvgSamplesPerSec=6.323431372663673, CurrSamplesPerSec=5.678844444331616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:09,479] [INFO] [timer.py:197:stop] 0/364, RunningAvgSamplesPerSec=6.32339248589731, CurrSamplesPerSec=5.671576284583911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:20,883] [INFO] [timer.py:197:stop] 0/366, RunningAvgSamplesPerSec=6.323253600927474, CurrSamplesPerSec=5.6644067881016404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:32,248] [INFO] [timer.py:197:stop] 0/368, RunningAvgSamplesPerSec=6.323253715509086, CurrSamplesPerSec=5.701989251343157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:43,589] [INFO] [timer.py:197:stop] 0/370, RunningAvgSamplesPerSec=6.323333571683783, CurrSamplesPerSec=5.709123607893714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:54,984] [INFO] [timer.py:197:stop] 0/372, RunningAvgSamplesPerSec=6.3231740893087585, CurrSamplesPerSec=5.660488032411503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:06,407] [INFO] [timer.py:197:stop] 0/374, RunningAvgSamplesPerSec=6.3231801853876135, CurrSamplesPerSec=5.70347698290224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:17,744] [INFO] [timer.py:197:stop] 0/376, RunningAvgSamplesPerSec=6.323287920909756, CurrSamplesPerSec=5.702711939935516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:29,091] [INFO] [timer.py:197:stop] 0/378, RunningAvgSamplesPerSec=6.323286122683462, CurrSamplesPerSec=5.687698937227285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:40,418] [INFO] [logging.py:68:log_dist] [Rank 0] step=190, skipped=3, lr=[8.417439256037237e-06], mom=[[0.9, 0.999]] [2022-12-16 20:16:40,420] [INFO] [timer.py:197:stop] 0/380, RunningAvgSamplesPerSec=6.323397786842634, CurrSamplesPerSec=5.71710686834569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:51,771] [INFO] [timer.py:197:stop] 0/382, RunningAvgSamplesPerSec=6.323379578682893, CurrSamplesPerSec=5.697321750762022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:03,330] [INFO] [timer.py:197:stop] 0/384, RunningAvgSamplesPerSec=6.323437318866515, CurrSamplesPerSec=5.70628228013329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:14,693] [INFO] [timer.py:197:stop] 0/386, RunningAvgSamplesPerSec=6.323427514274825, CurrSamplesPerSec=5.700057328547446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:26,094] [INFO] [timer.py:197:stop] 0/388, RunningAvgSamplesPerSec=6.323306500731329, CurrSamplesPerSec=5.678972033656878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:37,448] [INFO] [timer.py:197:stop] 0/390, RunningAvgSamplesPerSec=6.32331374462535, CurrSamplesPerSec=5.687762568718759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:48,794] [INFO] [timer.py:197:stop] 0/392, RunningAvgSamplesPerSec=6.323321526858228, CurrSamplesPerSec=5.692411274897352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:00,146] [INFO] [timer.py:197:stop] 0/394, RunningAvgSamplesPerSec=6.323302134655659, CurrSamplesPerSec=5.682476612732748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:11,512] [INFO] [timer.py:197:stop] 0/396, RunningAvgSamplesPerSec=6.323284170376101, CurrSamplesPerSec=5.6785734261450145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:23,016] [INFO] [timer.py:197:stop] 0/398, RunningAvgSamplesPerSec=6.323343518822214, CurrSamplesPerSec=5.709000973847339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:34,373] [INFO] [logging.py:68:log_dist] [Rank 0] step=200, skipped=3, lr=[8.501266121799902e-06], mom=[[0.9, 0.999]] [2022-12-16 20:18:34,375] [INFO] [timer.py:197:stop] 0/400, RunningAvgSamplesPerSec=6.32328421011125, CurrSamplesPerSec=5.6872694618968955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1287, 'learning_rate': 8.501266121799902e-06, 'epoch': 0.85} [2022-12-16 20:18:45,718] [INFO] [timer.py:197:stop] 0/402, RunningAvgSamplesPerSec=6.323284665364629, CurrSamplesPerSec=5.690986739759475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:57,149] [INFO] [timer.py:197:stop] 0/404, RunningAvgSamplesPerSec=6.323064316517982, CurrSamplesPerSec=5.632093995709347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:07,878] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 [2022-12-16 20:19:07,880] [INFO] [timer.py:197:stop] 0/406, RunningAvgSamplesPerSec=6.324963121796406, CurrSamplesPerSec=6.401309166670601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:19,240] [INFO] [timer.py:197:stop] 0/408, RunningAvgSamplesPerSec=6.3249501610981325, CurrSamplesPerSec=5.702367168220342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:30,606] [INFO] [timer.py:197:stop] 0/410, RunningAvgSamplesPerSec=6.324919708467864, CurrSamplesPerSec=5.679302927602853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:41,978] [INFO] [timer.py:197:stop] 0/412, RunningAvgSamplesPerSec=6.324875286613773, CurrSamplesPerSec=5.687440087427545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:53,395] [INFO] [timer.py:197:stop] 0/414, RunningAvgSamplesPerSec=6.32487659033467, CurrSamplesPerSec=5.692081989929908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:04,724] [INFO] [timer.py:197:stop] 0/416, RunningAvgSamplesPerSec=6.32491419453475, CurrSamplesPerSec=5.70353587815117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:16,092] [INFO] [timer.py:197:stop] 0/418, RunningAvgSamplesPerSec=6.324881861283918, CurrSamplesPerSec=5.668392257188035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:27,452] [INFO] [logging.py:68:log_dist] [Rank 0] step=210, skipped=4, lr=[8.573149077803088e-06], mom=[[0.9, 0.999]] [2022-12-16 20:20:27,454] [INFO] [timer.py:197:stop] 0/420, RunningAvgSamplesPerSec=6.324865240196057, CurrSamplesPerSec=5.678834112490291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:38,813] [INFO] [timer.py:197:stop] 0/422, RunningAvgSamplesPerSec=6.324857186084627, CurrSamplesPerSec=5.690756785945908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:50,150] [INFO] [timer.py:197:stop] 0/424, RunningAvgSamplesPerSec=6.324868863513124, CurrSamplesPerSec=5.700334274925125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:01,502] [INFO] [timer.py:197:stop] 0/426, RunningAvgSamplesPerSec=6.324832543627125, CurrSamplesPerSec=5.698868991857596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:12,881] [INFO] [timer.py:197:stop] 0/428, RunningAvgSamplesPerSec=6.324672136242403, CurrSamplesPerSec=5.638443932850549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:24,242] [INFO] [timer.py:197:stop] 0/430, RunningAvgSamplesPerSec=6.324660692835897, CurrSamplesPerSec=5.690038088967808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:35,582] [INFO] [timer.py:197:stop] 0/432, RunningAvgSamplesPerSec=6.324708429204134, CurrSamplesPerSec=5.704986345333319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:46,949] [INFO] [timer.py:197:stop] 0/434, RunningAvgSamplesPerSec=6.324708609830103, CurrSamplesPerSec=5.6855702723091985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:58,286] [INFO] [timer.py:197:stop] 0/436, RunningAvgSamplesPerSec=6.324717537651673, CurrSamplesPerSec=5.70938394951087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:09,632] [INFO] [timer.py:197:stop] 0/438, RunningAvgSamplesPerSec=6.324745496447643, CurrSamplesPerSec=5.714860750176948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:20,926] [INFO] [logging.py:68:log_dist] [Rank 0] step=220, skipped=4, lr=[8.64942458567722e-06], mom=[[0.9, 0.999]] [2022-12-16 20:22:20,928] [INFO] [timer.py:197:stop] 0/440, RunningAvgSamplesPerSec=6.324929046177664, CurrSamplesPerSec=5.729245008296917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:32,230] [INFO] [timer.py:197:stop] 0/442, RunningAvgSamplesPerSec=6.325051638910014, CurrSamplesPerSec=5.723627116481493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:43,588] [INFO] [timer.py:197:stop] 0/444, RunningAvgSamplesPerSec=6.325065678448945, CurrSamplesPerSec=5.693626627517337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:54,930] [INFO] [timer.py:197:stop] 0/446, RunningAvgSamplesPerSec=6.325110434850466, CurrSamplesPerSec=5.710919039097566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:06,252] [INFO] [timer.py:197:stop] 0/448, RunningAvgSamplesPerSec=6.325167702164579, CurrSamplesPerSec=5.705739870822447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:17,632] [INFO] [timer.py:197:stop] 0/450, RunningAvgSamplesPerSec=6.325104594839814, CurrSamplesPerSec=5.653316282959546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1225, 'learning_rate': 8.686247975778677e-06, 'epoch': 0.95} [2022-12-16 20:23:29,039] [INFO] [timer.py:197:stop] 0/452, RunningAvgSamplesPerSec=6.3249838784193, CurrSamplesPerSec=5.654148160062199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:40,379] [INFO] [timer.py:197:stop] 0/454, RunningAvgSamplesPerSec=6.325056824708796, CurrSamplesPerSec=5.716592834256081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:51,801] [INFO] [timer.py:197:stop] 0/456, RunningAvgSamplesPerSec=6.324918730493177, CurrSamplesPerSec=5.666096698292425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:03,158] [INFO] [timer.py:197:stop] 0/458, RunningAvgSamplesPerSec=6.3249018266408745, CurrSamplesPerSec=5.6668892730962055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:14,505] [INFO] [logging.py:68:log_dist] [Rank 0] step=230, skipped=4, lr=[8.722247506883805e-06], mom=[[0.9, 0.999]] [2022-12-16 20:24:14,507] [INFO] [timer.py:197:stop] 0/460, RunningAvgSamplesPerSec=6.324939407967614, CurrSamplesPerSec=5.702041090822069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:25,830] [INFO] [timer.py:197:stop] 0/462, RunningAvgSamplesPerSec=6.3249985635148835, CurrSamplesPerSec=5.695622592543446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:37,171] [INFO] [timer.py:197:stop] 0/464, RunningAvgSamplesPerSec=6.325064934601947, CurrSamplesPerSec=5.706983489691964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:47,866] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 [2022-12-16 20:24:47,868] [INFO] [timer.py:197:stop] 0/466, RunningAvgSamplesPerSec=6.326814356293397, CurrSamplesPerSec=6.417096329026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:59,202] [INFO] [timer.py:197:stop] 0/468, RunningAvgSamplesPerSec=6.326888453208697, CurrSamplesPerSec=5.710727563347197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:10,596] [INFO] [timer.py:197:stop] 0/470, RunningAvgSamplesPerSec=6.32680186680534, CurrSamplesPerSec=5.657430662739473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:19,121] [INFO] [timer.py:197:stop] 0/472, RunningAvgSamplesPerSec=6.333450758411454, CurrSamplesPerSec=10.137390170749411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:30,504] [INFO] [timer.py:197:stop] 0/474, RunningAvgSamplesPerSec=6.333370618689054, CurrSamplesPerSec=5.659949520087447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:41,898] [INFO] [timer.py:197:stop] 0/476, RunningAvgSamplesPerSec=6.333330907948472, CurrSamplesPerSec=5.689830402708782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:53,272] [INFO] [timer.py:197:stop] 0/478, RunningAvgSamplesPerSec=6.333309972082303, CurrSamplesPerSec=5.679410830915424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:04,786] [INFO] [logging.py:68:log_dist] [Rank 0] step=240, skipped=5, lr=[8.785084156039184e-06], mom=[[0.9, 0.999]] [2022-12-16 20:26:04,788] [INFO] [timer.py:197:stop] 0/480, RunningAvgSamplesPerSec=6.333324209339847, CurrSamplesPerSec=5.711554548737424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:16,153] [INFO] [timer.py:197:stop] 0/482, RunningAvgSamplesPerSec=6.333275648262772, CurrSamplesPerSec=5.67584738469762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:27,536] [INFO] [timer.py:197:stop] 0/484, RunningAvgSamplesPerSec=6.333180443017503, CurrSamplesPerSec=5.658684801072698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:38,936] [INFO] [timer.py:197:stop] 0/486, RunningAvgSamplesPerSec=6.333063356107658, CurrSamplesPerSec=5.66522088952818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:50,275] [INFO] [timer.py:197:stop] 0/488, RunningAvgSamplesPerSec=6.333064958163027, CurrSamplesPerSec=5.716922526210353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:01,596] [INFO] [timer.py:197:stop] 0/490, RunningAvgSamplesPerSec=6.333097497150548, CurrSamplesPerSec=5.708483783263635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:12,974] [INFO] [timer.py:197:stop] 0/492, RunningAvgSamplesPerSec=6.333030716341655, CurrSamplesPerSec=5.669087297819405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:24,283] [INFO] [timer.py:197:stop] 0/494, RunningAvgSamplesPerSec=6.33308950716907, CurrSamplesPerSec=5.72159806401921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:35,659] [INFO] [timer.py:197:stop] 0/496, RunningAvgSamplesPerSec=6.333030116862234, CurrSamplesPerSec=5.685264896868218, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:47,023] [INFO] [timer.py:197:stop] 0/498, RunningAvgSamplesPerSec=6.332996476016625, CurrSamplesPerSec=5.69285118566385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:58,379] [INFO] [logging.py:68:log_dist] [Rank 0] step=250, skipped=5, lr=[8.852140188761744e-06], mom=[[0.9, 0.999]] [2022-12-16 20:27:58,380] [INFO] [timer.py:197:stop] 0/500, RunningAvgSamplesPerSec=6.332979581463783, CurrSamplesPerSec=5.68674728607022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0911, 'learning_rate': 8.852140188761744e-06, 'epoch': 1.06} [2022-12-16 20:28:09,749] [INFO] [timer.py:197:stop] 0/502, RunningAvgSamplesPerSec=6.332968543294653, CurrSamplesPerSec=5.698553960024957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:21,117] [INFO] [timer.py:197:stop] 0/504, RunningAvgSamplesPerSec=6.3329375267007215, CurrSamplesPerSec=5.680363153261814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:32,487] [INFO] [timer.py:197:stop] 0/506, RunningAvgSamplesPerSec=6.332904182885525, CurrSamplesPerSec=5.687939491029711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:43,855] [INFO] [timer.py:197:stop] 0/508, RunningAvgSamplesPerSec=6.33285409700931, CurrSamplesPerSec=5.677202633924842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:55,191] [INFO] [timer.py:197:stop] 0/510, RunningAvgSamplesPerSec=6.332775886204445, CurrSamplesPerSec=5.648280477264634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:06,683] [INFO] [timer.py:197:stop] 0/512, RunningAvgSamplesPerSec=6.332755158221311, CurrSamplesPerSec=5.688899257349144, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:18,066] [INFO] [timer.py:197:stop] 0/514, RunningAvgSamplesPerSec=6.332672689917679, CurrSamplesPerSec=5.677710568918598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:29,434] [INFO] [timer.py:197:stop] 0/516, RunningAvgSamplesPerSec=6.3326342128881095, CurrSamplesPerSec=5.682176861785842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:40,802] [INFO] [timer.py:197:stop] 0/518, RunningAvgSamplesPerSec=6.332589814616271, CurrSamplesPerSec=5.674145177376899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:52,188] [INFO] [logging.py:68:log_dist] [Rank 0] step=260, skipped=5, lr=[8.916513249749862e-06], mom=[[0.9, 0.999]] [2022-12-16 20:29:52,189] [INFO] [timer.py:197:stop] 0/520, RunningAvgSamplesPerSec=6.332535441063731, CurrSamplesPerSec=5.690993254986084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:03,651] [INFO] [timer.py:197:stop] 0/522, RunningAvgSamplesPerSec=6.332321192236601, CurrSamplesPerSec=5.627202311158195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:15,079] [INFO] [timer.py:197:stop] 0/524, RunningAvgSamplesPerSec=6.332264757881449, CurrSamplesPerSec=5.681202047908641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:26,506] [INFO] [timer.py:197:stop] 0/526, RunningAvgSamplesPerSec=6.332111989659685, CurrSamplesPerSec=5.663871593105844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:37,853] [INFO] [timer.py:197:stop] 0/528, RunningAvgSamplesPerSec=6.332086845890149, CurrSamplesPerSec=5.697914808915609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:49,299] [INFO] [timer.py:197:stop] 0/530, RunningAvgSamplesPerSec=6.332118305953814, CurrSamplesPerSec=5.707362068898067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:00,725] [INFO] [timer.py:197:stop] 0/532, RunningAvgSamplesPerSec=6.332145135171992, CurrSamplesPerSec=5.7186139518732695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:12,098] [INFO] [timer.py:197:stop] 0/534, RunningAvgSamplesPerSec=6.332096332695988, CurrSamplesPerSec=5.686162813500998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:23,474] [INFO] [timer.py:197:stop] 0/536, RunningAvgSamplesPerSec=6.332142347447645, CurrSamplesPerSec=5.717022610095439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:34,884] [INFO] [timer.py:197:stop] 0/538, RunningAvgSamplesPerSec=6.3320406629010915, CurrSamplesPerSec=5.6585333113317695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:46,283] [INFO] [logging.py:68:log_dist] [Rank 0] step=270, skipped=5, lr=[8.978409800937961e-06], mom=[[0.9, 0.999]] [2022-12-16 20:31:46,285] [INFO] [timer.py:197:stop] 0/540, RunningAvgSamplesPerSec=6.331949549833869, CurrSamplesPerSec=5.658519236318848, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:57,631] [INFO] [timer.py:197:stop] 0/542, RunningAvgSamplesPerSec=6.331908691993134, CurrSamplesPerSec=5.674752614311713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:09,011] [INFO] [timer.py:197:stop] 0/544, RunningAvgSamplesPerSec=6.331837872115681, CurrSamplesPerSec=5.664258577485706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:20,410] [INFO] [timer.py:197:stop] 0/546, RunningAvgSamplesPerSec=6.331805803303936, CurrSamplesPerSec=5.676086937031499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:31,832] [INFO] [timer.py:197:stop] 0/548, RunningAvgSamplesPerSec=6.331781192265678, CurrSamplesPerSec=5.680064586753917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:43,212] [INFO] [timer.py:197:stop] 0/550, RunningAvgSamplesPerSec=6.3318012701947355, CurrSamplesPerSec=5.7015488960272975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0701, 'learning_rate': 9.00848753507038e-06, 'epoch': 1.17} [2022-12-16 20:32:54,503] [INFO] [timer.py:197:stop] 0/552, RunningAvgSamplesPerSec=6.331942303093126, CurrSamplesPerSec=5.7685950512679565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:05,886] [INFO] [timer.py:197:stop] 0/554, RunningAvgSamplesPerSec=6.331877715290052, CurrSamplesPerSec=5.652510360163067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:17,264] [INFO] [timer.py:197:stop] 0/556, RunningAvgSamplesPerSec=6.331889387981903, CurrSamplesPerSec=5.6983301682483605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:28,711] [INFO] [timer.py:197:stop] 0/558, RunningAvgSamplesPerSec=6.331829135305767, CurrSamplesPerSec=5.683874023366699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:40,068] [INFO] [logging.py:68:log_dist] [Rank 0] step=280, skipped=5, lr=[9.038013352913754e-06], mom=[[0.9, 0.999]] [2022-12-16 20:33:40,070] [INFO] [timer.py:197:stop] 0/560, RunningAvgSamplesPerSec=6.331762277422967, CurrSamplesPerSec=5.6758255427552475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:51,539] [INFO] [timer.py:197:stop] 0/562, RunningAvgSamplesPerSec=6.33149639148276, CurrSamplesPerSec=5.578608736601625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:02,921] [INFO] [timer.py:197:stop] 0/564, RunningAvgSamplesPerSec=6.331446008676222, CurrSamplesPerSec=5.665100373416714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:14,469] [INFO] [timer.py:197:stop] 0/566, RunningAvgSamplesPerSec=6.331452750529917, CurrSamplesPerSec=5.698296540516734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:25,876] [INFO] [timer.py:197:stop] 0/568, RunningAvgSamplesPerSec=6.331366092485218, CurrSamplesPerSec=5.647619758537656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:37,224] [INFO] [timer.py:197:stop] 0/570, RunningAvgSamplesPerSec=6.331366844477702, CurrSamplesPerSec=5.709335862237885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:48,577] [INFO] [timer.py:197:stop] 0/572, RunningAvgSamplesPerSec=6.331423638643338, CurrSamplesPerSec=5.7014241652722255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:00,234] [INFO] [timer.py:197:stop] 0/574, RunningAvgSamplesPerSec=6.331173142949745, CurrSamplesPerSec=5.674498540151139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:11,592] [INFO] [timer.py:197:stop] 0/576, RunningAvgSamplesPerSec=6.331154306508723, CurrSamplesPerSec=5.684172749770408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:23,056] [INFO] [timer.py:197:stop] 0/578, RunningAvgSamplesPerSec=6.331166114638882, CurrSamplesPerSec=5.694031457786667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:34,645] [INFO] [logging.py:68:log_dist] [Rank 0] step=290, skipped=5, lr=[9.095487745564754e-06], mom=[[0.9, 0.999]] [2022-12-16 20:35:34,647] [INFO] [timer.py:197:stop] 0/580, RunningAvgSamplesPerSec=6.33109363132063, CurrSamplesPerSec=5.688584603467747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:45,976] [INFO] [timer.py:197:stop] 0/582, RunningAvgSamplesPerSec=6.331107874787885, CurrSamplesPerSec=5.708166230976763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:57,335] [INFO] [timer.py:197:stop] 0/584, RunningAvgSamplesPerSec=6.331087853249488, CurrSamplesPerSec=5.686966795466654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:08,878] [INFO] [timer.py:197:stop] 0/586, RunningAvgSamplesPerSec=6.331042606888964, CurrSamplesPerSec=5.681727775695035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:20,260] [INFO] [timer.py:197:stop] 0/588, RunningAvgSamplesPerSec=6.330978840033063, CurrSamplesPerSec=5.674192433800074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:31,673] [INFO] [timer.py:197:stop] 0/590, RunningAvgSamplesPerSec=6.331058742308961, CurrSamplesPerSec=5.72288667256049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:43,263] [INFO] [timer.py:197:stop] 0/592, RunningAvgSamplesPerSec=6.331048536749523, CurrSamplesPerSec=5.697260565409411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:54,663] [INFO] [timer.py:197:stop] 0/594, RunningAvgSamplesPerSec=6.330940201893569, CurrSamplesPerSec=5.636410670239449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:06,011] [INFO] [timer.py:197:stop] 0/596, RunningAvgSamplesPerSec=6.330946589807912, CurrSamplesPerSec=5.697826519598953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:17,405] [INFO] [timer.py:197:stop] 0/598, RunningAvgSamplesPerSec=6.330878178920464, CurrSamplesPerSec=5.708957749637251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:28,767] [INFO] [logging.py:68:log_dist] [Rank 0] step=300, skipped=5, lr=[9.150979862726452e-06], mom=[[0.9, 0.999]] [2022-12-16 20:37:28,769] [INFO] [timer.py:197:stop] 0/600, RunningAvgSamplesPerSec=6.330782670927434, CurrSamplesPerSec=5.643513461533772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.068, 'learning_rate': 9.150979862726452e-06, 'epoch': 1.27} [2022-12-16 20:37:40,125] [INFO] [timer.py:197:stop] 0/602, RunningAvgSamplesPerSec=6.330782383440869, CurrSamplesPerSec=5.690124207229166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:51,504] [INFO] [timer.py:197:stop] 0/604, RunningAvgSamplesPerSec=6.3306832707920435, CurrSamplesPerSec=5.6678194496573235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:02,851] [INFO] [timer.py:197:stop] 0/606, RunningAvgSamplesPerSec=6.330655808571751, CurrSamplesPerSec=5.682673657266739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:14,168] [INFO] [timer.py:197:stop] 0/608, RunningAvgSamplesPerSec=6.330691143435459, CurrSamplesPerSec=5.6872265661282, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:25,693] [INFO] [timer.py:197:stop] 0/610, RunningAvgSamplesPerSec=6.330329932714655, CurrSamplesPerSec=5.668653446060722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:37,073] [INFO] [timer.py:197:stop] 0/612, RunningAvgSamplesPerSec=6.330272091806936, CurrSamplesPerSec=5.659673142650592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:48,518] [INFO] [timer.py:197:stop] 0/614, RunningAvgSamplesPerSec=6.330101699475937, CurrSamplesPerSec=5.62189192221988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:00,031] [INFO] [timer.py:197:stop] 0/616, RunningAvgSamplesPerSec=6.330042213874827, CurrSamplesPerSec=5.6853116162218384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:11,430] [INFO] [timer.py:197:stop] 0/618, RunningAvgSamplesPerSec=6.329934203787018, CurrSamplesPerSec=5.662486866094601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:22,683] [INFO] [logging.py:68:log_dist] [Rank 0] step=310, skipped=5, lr=[9.204621894113846e-06], mom=[[0.9, 0.999]] [2022-12-16 20:39:22,684] [INFO] [timer.py:197:stop] 0/620, RunningAvgSamplesPerSec=6.329990960271664, CurrSamplesPerSec=5.721851007076893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:34,075] [INFO] [timer.py:197:stop] 0/622, RunningAvgSamplesPerSec=6.330011262521667, CurrSamplesPerSec=5.712285012692834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:45,443] [INFO] [timer.py:197:stop] 0/624, RunningAvgSamplesPerSec=6.329971505255353, CurrSamplesPerSec=5.673877006086818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:56,776] [INFO] [timer.py:197:stop] 0/626, RunningAvgSamplesPerSec=6.330001807060071, CurrSamplesPerSec=5.713938909287577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:08,120] [INFO] [timer.py:197:stop] 0/628, RunningAvgSamplesPerSec=6.330083907626634, CurrSamplesPerSec=5.71931502735193, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:19,422] [INFO] [timer.py:197:stop] 0/630, RunningAvgSamplesPerSec=6.330141783936141, CurrSamplesPerSec=5.7165628862802125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:30,734] [INFO] [timer.py:197:stop] 0/632, RunningAvgSamplesPerSec=6.33018163456153, CurrSamplesPerSec=5.710113372472082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:42,220] [INFO] [timer.py:197:stop] 0/634, RunningAvgSamplesPerSec=6.330237538427551, CurrSamplesPerSec=5.711593437284221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:53,732] [INFO] [timer.py:197:stop] 0/636, RunningAvgSamplesPerSec=6.330298913515279, CurrSamplesPerSec=5.720521903529705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:05,190] [INFO] [timer.py:197:stop] 0/638, RunningAvgSamplesPerSec=6.33008533267941, CurrSamplesPerSec=5.573157273614473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:16,628] [INFO] [logging.py:68:log_dist] [Rank 0] step=320, skipped=5, lr=[9.256533232218034e-06], mom=[[0.9, 0.999]] [2022-12-16 20:41:16,630] [INFO] [timer.py:197:stop] 0/640, RunningAvgSamplesPerSec=6.330096923728828, CurrSamplesPerSec=5.724061856567901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:28,236] [INFO] [timer.py:197:stop] 0/642, RunningAvgSamplesPerSec=6.330130764892317, CurrSamplesPerSec=5.707274214632692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:39,681] [INFO] [timer.py:197:stop] 0/644, RunningAvgSamplesPerSec=6.329947385873928, CurrSamplesPerSec=5.589888718849701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:51,204] [INFO] [timer.py:197:stop] 0/646, RunningAvgSamplesPerSec=6.33000563881282, CurrSamplesPerSec=5.708520444862126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:02,680] [INFO] [timer.py:197:stop] 0/648, RunningAvgSamplesPerSec=6.330004564500237, CurrSamplesPerSec=5.6796963498806345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:14,226] [INFO] [timer.py:197:stop] 0/650, RunningAvgSamplesPerSec=6.329629407966906, CurrSamplesPerSec=5.467900776983032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0673, 'learning_rate': 9.281874101213678e-06, 'epoch': 1.38} [2022-12-16 20:42:25,619] [INFO] [timer.py:197:stop] 0/652, RunningAvgSamplesPerSec=6.3297077715609475, CurrSamplesPerSec=5.719371812973972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:36,953] [INFO] [timer.py:197:stop] 0/654, RunningAvgSamplesPerSec=6.3297367998254535, CurrSamplesPerSec=5.705432082423167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:48,561] [INFO] [timer.py:197:stop] 0/656, RunningAvgSamplesPerSec=6.329244035664368, CurrSamplesPerSec=5.4295023895352434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:59,916] [INFO] [timer.py:197:stop] 0/658, RunningAvgSamplesPerSec=6.329237844863367, CurrSamplesPerSec=5.688861882844252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:11,315] [INFO] [logging.py:68:log_dist] [Rank 0] step=330, skipped=5, lr=[9.306822072655195e-06], mom=[[0.9, 0.999]] [2022-12-16 20:43:11,317] [INFO] [timer.py:197:stop] 0/660, RunningAvgSamplesPerSec=6.329147547268884, CurrSamplesPerSec=5.661335155406955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:22,771] [INFO] [timer.py:197:stop] 0/662, RunningAvgSamplesPerSec=6.328950035472537, CurrSamplesPerSec=5.603190307343656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:34,098] [INFO] [timer.py:197:stop] 0/664, RunningAvgSamplesPerSec=6.328962025890756, CurrSamplesPerSec=5.695756979637728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:45,431] [INFO] [timer.py:197:stop] 0/666, RunningAvgSamplesPerSec=6.3289940857266, CurrSamplesPerSec=5.71611467805298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:57,131] [INFO] [timer.py:197:stop] 0/668, RunningAvgSamplesPerSec=6.328336634156604, CurrSamplesPerSec=5.346707418970935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:08,494] [INFO] [timer.py:197:stop] 0/670, RunningAvgSamplesPerSec=6.328314264906798, CurrSamplesPerSec=5.697728074190073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:19,922] [INFO] [timer.py:197:stop] 0/672, RunningAvgSamplesPerSec=6.328180041670801, CurrSamplesPerSec=5.648812729681554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:31,398] [INFO] [timer.py:197:stop] 0/674, RunningAvgSamplesPerSec=6.3281297840632105, CurrSamplesPerSec=5.669640962282183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:42,725] [INFO] [timer.py:197:stop] 0/676, RunningAvgSamplesPerSec=6.328147856603661, CurrSamplesPerSec=5.706415715042744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:54,300] [INFO] [timer.py:197:stop] 0/678, RunningAvgSamplesPerSec=6.328126766932452, CurrSamplesPerSec=5.706175536698659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:05,797] [INFO] [logging.py:68:log_dist] [Rank 0] step=340, skipped=5, lr=[9.355586771917604e-06], mom=[[0.9, 0.999]] [2022-12-16 20:45:05,799] [INFO] [timer.py:197:stop] 0/680, RunningAvgSamplesPerSec=6.328068517423547, CurrSamplesPerSec=5.6591135490713995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:17,174] [INFO] [timer.py:197:stop] 0/682, RunningAvgSamplesPerSec=6.328021957238399, CurrSamplesPerSec=5.682638529889822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:28,596] [INFO] [timer.py:197:stop] 0/684, RunningAvgSamplesPerSec=6.327975433273524, CurrSamplesPerSec=5.680331179663342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:40,277] [INFO] [timer.py:197:stop] 0/686, RunningAvgSamplesPerSec=6.327854974024864, CurrSamplesPerSec=5.683192919422884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:51,668] [INFO] [timer.py:197:stop] 0/688, RunningAvgSamplesPerSec=6.3277861583761235, CurrSamplesPerSec=5.6660746921346865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:03,229] [INFO] [timer.py:197:stop] 0/690, RunningAvgSamplesPerSec=6.327768422044784, CurrSamplesPerSec=5.680019395846795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:14,838] [INFO] [timer.py:197:stop] 0/692, RunningAvgSamplesPerSec=6.327696014574452, CurrSamplesPerSec=5.657069646483641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:26,177] [INFO] [timer.py:197:stop] 0/694, RunningAvgSamplesPerSec=6.327699873718016, CurrSamplesPerSec=5.690042913451338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:37,632] [INFO] [timer.py:197:stop] 0/696, RunningAvgSamplesPerSec=6.3276904427181755, CurrSamplesPerSec=5.683944790522634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:49,248] [INFO] [timer.py:197:stop] 0/698, RunningAvgSamplesPerSec=6.327720553451032, CurrSamplesPerSec=5.706627525449706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:00,579] [INFO] [logging.py:68:log_dist] [Rank 0] step=350, skipped=5, lr=[9.402917005361869e-06], mom=[[0.9, 0.999]] [2022-12-16 20:47:00,581] [INFO] [timer.py:197:stop] 0/700, RunningAvgSamplesPerSec=6.327703647205433, CurrSamplesPerSec=5.679101551095431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0679, 'learning_rate': 9.402917005361869e-06, 'epoch': 1.48} [2022-12-16 20:47:11,950] [INFO] [timer.py:197:stop] 0/702, RunningAvgSamplesPerSec=6.327654490303129, CurrSamplesPerSec=5.675178761431945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:23,335] [INFO] [timer.py:197:stop] 0/704, RunningAvgSamplesPerSec=6.327571946536897, CurrSamplesPerSec=5.68916306298161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:34,792] [INFO] [timer.py:197:stop] 0/706, RunningAvgSamplesPerSec=6.3273886387415645, CurrSamplesPerSec=5.601829945397302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:46,190] [INFO] [timer.py:197:stop] 0/708, RunningAvgSamplesPerSec=6.327378689010138, CurrSamplesPerSec=5.6861664269358885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:57,593] [INFO] [timer.py:197:stop] 0/710, RunningAvgSamplesPerSec=6.327316883649861, CurrSamplesPerSec=5.729989789413974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:08,972] [INFO] [timer.py:197:stop] 0/712, RunningAvgSamplesPerSec=6.327270752057368, CurrSamplesPerSec=5.654470688433138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:20,337] [INFO] [timer.py:197:stop] 0/714, RunningAvgSamplesPerSec=6.3272791064630285, CurrSamplesPerSec=5.7056947555522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:31,825] [INFO] [timer.py:197:stop] 0/716, RunningAvgSamplesPerSec=6.327258717853537, CurrSamplesPerSec=5.680939701404044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:43,188] [INFO] [timer.py:197:stop] 0/718, RunningAvgSamplesPerSec=6.327295608249954, CurrSamplesPerSec=5.7009349827558875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:54,690] [INFO] [logging.py:68:log_dist] [Rank 0] step=360, skipped=5, lr=[9.44889475969735e-06], mom=[[0.9, 0.999]] [2022-12-16 20:48:54,691] [INFO] [timer.py:197:stop] 0/720, RunningAvgSamplesPerSec=6.327035979572224, CurrSamplesPerSec=5.538819333340514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:06,214] [INFO] [timer.py:197:stop] 0/722, RunningAvgSamplesPerSec=6.3271111128208615, CurrSamplesPerSec=5.722535309390941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:17,701] [INFO] [timer.py:197:stop] 0/724, RunningAvgSamplesPerSec=6.327128664425442, CurrSamplesPerSec=5.708196333806104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:29,078] [INFO] [timer.py:197:stop] 0/726, RunningAvgSamplesPerSec=6.327087552167851, CurrSamplesPerSec=5.676013965040711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:40,463] [INFO] [timer.py:197:stop] 0/728, RunningAvgSamplesPerSec=6.327109934448353, CurrSamplesPerSec=5.6931926341853965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:51,845] [INFO] [timer.py:197:stop] 0/730, RunningAvgSamplesPerSec=6.327038089475233, CurrSamplesPerSec=5.669245100330484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:03,266] [INFO] [timer.py:197:stop] 0/732, RunningAvgSamplesPerSec=6.326924670910652, CurrSamplesPerSec=5.621069038048019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:14,582] [INFO] [timer.py:197:stop] 0/734, RunningAvgSamplesPerSec=6.326959216687658, CurrSamplesPerSec=5.701227997844524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:25,927] [INFO] [timer.py:197:stop] 0/736, RunningAvgSamplesPerSec=6.326940875652652, CurrSamplesPerSec=5.674422250368862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:37,316] [INFO] [timer.py:197:stop] 0/738, RunningAvgSamplesPerSec=6.326823613295239, CurrSamplesPerSec=5.6246842242873045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:48,661] [INFO] [logging.py:68:log_dist] [Rank 0] step=370, skipped=5, lr=[9.493595187571683e-06], mom=[[0.9, 0.999]] [2022-12-16 20:50:48,663] [INFO] [timer.py:197:stop] 0/740, RunningAvgSamplesPerSec=6.3268314840496505, CurrSamplesPerSec=5.707714969358892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:00,070] [INFO] [timer.py:197:stop] 0/742, RunningAvgSamplesPerSec=6.326866428321336, CurrSamplesPerSec=5.711735384974162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:11,574] [INFO] [timer.py:197:stop] 0/744, RunningAvgSamplesPerSec=6.326713261271251, CurrSamplesPerSec=5.631804735285801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:22,930] [INFO] [timer.py:197:stop] 0/746, RunningAvgSamplesPerSec=6.326691157515512, CurrSamplesPerSec=5.673347214431089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:34,299] [INFO] [timer.py:197:stop] 0/748, RunningAvgSamplesPerSec=6.326638394503159, CurrSamplesPerSec=5.668620646503149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:45,734] [INFO] [timer.py:197:stop] 0/750, RunningAvgSamplesPerSec=6.326511199481409, CurrSamplesPerSec=5.6745774710472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0671, 'learning_rate': 9.51548820454122e-06, 'epoch': 1.59} [2022-12-16 20:51:57,109] [INFO] [timer.py:197:stop] 0/752, RunningAvgSamplesPerSec=6.326485323384035, CurrSamplesPerSec=5.6853226941193356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:08,661] [INFO] [timer.py:197:stop] 0/754, RunningAvgSamplesPerSec=6.326162477755519, CurrSamplesPerSec=5.498609592683866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:19,985] [INFO] [timer.py:197:stop] 0/756, RunningAvgSamplesPerSec=6.326188890677415, CurrSamplesPerSec=5.701028695875893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:31,324] [INFO] [timer.py:197:stop] 0/758, RunningAvgSamplesPerSec=6.326163715081771, CurrSamplesPerSec=5.669143569330741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:42,739] [INFO] [logging.py:68:log_dist] [Rank 0] step=380, skipped=5, lr=[9.53708734662638e-06], mom=[[0.9, 0.999]] [2022-12-16 20:52:42,741] [INFO] [timer.py:197:stop] 0/760, RunningAvgSamplesPerSec=6.326036630362106, CurrSamplesPerSec=5.680581689222482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:54,099] [INFO] [timer.py:197:stop] 0/762, RunningAvgSamplesPerSec=6.326032010296816, CurrSamplesPerSec=5.697016078956156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:05,616] [INFO] [timer.py:197:stop] 0/764, RunningAvgSamplesPerSec=6.325775425618378, CurrSamplesPerSec=5.5389985402003274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:17,029] [INFO] [timer.py:197:stop] 0/766, RunningAvgSamplesPerSec=6.325833860421, CurrSamplesPerSec=5.722426005204077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:28,386] [INFO] [timer.py:197:stop] 0/768, RunningAvgSamplesPerSec=6.325830415376826, CurrSamplesPerSec=5.686190275721349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:39,816] [INFO] [timer.py:197:stop] 0/770, RunningAvgSamplesPerSec=6.325709680080751, CurrSamplesPerSec=5.627372182760118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:51,153] [INFO] [timer.py:197:stop] 0/772, RunningAvgSamplesPerSec=6.32574588936009, CurrSamplesPerSec=5.720313205022317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:02,473] [INFO] [timer.py:197:stop] 0/774, RunningAvgSamplesPerSec=6.325779808632974, CurrSamplesPerSec=5.700795024194597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:14,088] [INFO] [timer.py:197:stop] 0/776, RunningAvgSamplesPerSec=6.325366824020554, CurrSamplesPerSec=5.711546041938402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:25,416] [INFO] [timer.py:197:stop] 0/778, RunningAvgSamplesPerSec=6.325388120195118, CurrSamplesPerSec=5.688599551753485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:36,837] [INFO] [logging.py:68:log_dist] [Rank 0] step=390, skipped=5, lr=[9.57943484127219e-06], mom=[[0.9, 0.999]] [2022-12-16 20:54:36,839] [INFO] [timer.py:197:stop] 0/780, RunningAvgSamplesPerSec=6.325287416370516, CurrSamplesPerSec=5.61396389033286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:48,184] [INFO] [timer.py:197:stop] 0/782, RunningAvgSamplesPerSec=6.325311851558554, CurrSamplesPerSec=5.685149787510647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:59,503] [INFO] [timer.py:197:stop] 0/784, RunningAvgSamplesPerSec=6.325374166849946, CurrSamplesPerSec=5.711454413166849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:10,900] [INFO] [timer.py:197:stop] 0/786, RunningAvgSamplesPerSec=6.3253531728228385, CurrSamplesPerSec=5.716634956736923, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:22,255] [INFO] [timer.py:197:stop] 0/788, RunningAvgSamplesPerSec=6.3253672114173085, CurrSamplesPerSec=5.696488243568338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:33,802] [INFO] [timer.py:197:stop] 0/790, RunningAvgSamplesPerSec=6.325068850147272, CurrSamplesPerSec=5.498009999649353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:45,139] [INFO] [timer.py:197:stop] 0/792, RunningAvgSamplesPerSec=6.325112867831402, CurrSamplesPerSec=5.697386081443267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:56,474] [INFO] [timer.py:197:stop] 0/794, RunningAvgSamplesPerSec=6.325150134664613, CurrSamplesPerSec=5.681703002256421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:07,910] [INFO] [timer.py:197:stop] 0/796, RunningAvgSamplesPerSec=6.325005608906404, CurrSamplesPerSec=5.594190346236312, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:19,261] [INFO] [timer.py:197:stop] 0/798, RunningAvgSamplesPerSec=6.3250222821537205, CurrSamplesPerSec=5.69781031336407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:30,630] [INFO] [logging.py:68:log_dist] [Rank 0] step=400, skipped=5, lr=[9.620696382156558e-06], mom=[[0.9, 0.999]] [2022-12-16 20:56:30,632] [INFO] [timer.py:197:stop] 0/800, RunningAvgSamplesPerSec=6.324955084688922, CurrSamplesPerSec=5.645958662739079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0739, 'learning_rate': 9.620696382156558e-06, 'epoch': 1.69} [2022-12-16 20:56:42,048] [INFO] [timer.py:197:stop] 0/802, RunningAvgSamplesPerSec=6.324866416048774, CurrSamplesPerSec=5.690673302590797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:53,416] [INFO] [timer.py:197:stop] 0/804, RunningAvgSamplesPerSec=6.324849206965416, CurrSamplesPerSec=5.679540608704553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:04,806] [INFO] [timer.py:197:stop] 0/806, RunningAvgSamplesPerSec=6.324771230961369, CurrSamplesPerSec=5.641830353065095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:16,167] [INFO] [timer.py:197:stop] 0/808, RunningAvgSamplesPerSec=6.324769497199892, CurrSamplesPerSec=5.689590894872664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:27,551] [INFO] [timer.py:197:stop] 0/810, RunningAvgSamplesPerSec=6.324736821415485, CurrSamplesPerSec=5.685807755696759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:39,228] [INFO] [timer.py:197:stop] 0/812, RunningAvgSamplesPerSec=6.324646795857594, CurrSamplesPerSec=5.683070915568724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:50,570] [INFO] [timer.py:197:stop] 0/814, RunningAvgSamplesPerSec=6.324672285175087, CurrSamplesPerSec=5.70057977262278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:01,922] [INFO] [timer.py:197:stop] 0/816, RunningAvgSamplesPerSec=6.324660014668366, CurrSamplesPerSec=5.694164803509382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:13,588] [INFO] [timer.py:197:stop] 0/818, RunningAvgSamplesPerSec=6.324670226228126, CurrSamplesPerSec=5.698775107760346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:24,936] [INFO] [logging.py:68:log_dist] [Rank 0] step=410, skipped=5, lr=[9.660926275674324e-06], mom=[[0.9, 0.999]] [2022-12-16 20:58:24,938] [INFO] [timer.py:197:stop] 0/820, RunningAvgSamplesPerSec=6.3246971948466655, CurrSamplesPerSec=5.701889208658522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:36,585] [INFO] [timer.py:197:stop] 0/822, RunningAvgSamplesPerSec=6.324232119390502, CurrSamplesPerSec=5.404802825653943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:47,931] [INFO] [timer.py:197:stop] 0/824, RunningAvgSamplesPerSec=6.324252861625952, CurrSamplesPerSec=5.691459011684438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:59,279] [INFO] [timer.py:197:stop] 0/826, RunningAvgSamplesPerSec=6.324276862279914, CurrSamplesPerSec=5.691145039857559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:10,688] [INFO] [timer.py:197:stop] 0/828, RunningAvgSamplesPerSec=6.324198814394731, CurrSamplesPerSec=5.654148874633871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:22,046] [INFO] [timer.py:197:stop] 0/830, RunningAvgSamplesPerSec=6.324261490328659, CurrSamplesPerSec=5.718737973915092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:33,412] [INFO] [timer.py:197:stop] 0/832, RunningAvgSamplesPerSec=6.324283135016285, CurrSamplesPerSec=5.693500069483766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:44,762] [INFO] [timer.py:197:stop] 0/834, RunningAvgSamplesPerSec=6.3243167430060785, CurrSamplesPerSec=5.695427790856719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:56,411] [INFO] [timer.py:197:stop] 0/836, RunningAvgSamplesPerSec=6.3243108942685415, CurrSamplesPerSec=5.677710568918598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:08,280] [INFO] [timer.py:197:stop] 0/838, RunningAvgSamplesPerSec=6.324302607187967, CurrSamplesPerSec=5.6794915808427024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:20,196] [INFO] [logging.py:68:log_dist] [Rank 0] step=420, skipped=5, lr=[9.700174853763023e-06], mom=[[0.9, 0.999]] [2022-12-16 21:00:20,197] [INFO] [timer.py:197:stop] 0/840, RunningAvgSamplesPerSec=6.324301713290257, CurrSamplesPerSec=5.685020234855203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:32,195] [INFO] [timer.py:197:stop] 0/842, RunningAvgSamplesPerSec=6.324186164489815, CurrSamplesPerSec=5.6956958278894705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:43,595] [INFO] [timer.py:197:stop] 0/844, RunningAvgSamplesPerSec=6.324133462309744, CurrSamplesPerSec=5.6358783857465875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:55,109] [INFO] [timer.py:197:stop] 0/846, RunningAvgSamplesPerSec=6.324170851164534, CurrSamplesPerSec=5.717252256195339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:06,656] [INFO] [timer.py:197:stop] 0/848, RunningAvgSamplesPerSec=6.324132295042807, CurrSamplesPerSec=5.647554883343113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:18,191] [INFO] [timer.py:197:stop] 0/850, RunningAvgSamplesPerSec=6.323868844928134, CurrSamplesPerSec=5.5051387907455664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0675, 'learning_rate': 9.719445885591654e-06, 'epoch': 1.8} [2022-12-16 21:01:29,778] [INFO] [timer.py:197:stop] 0/852, RunningAvgSamplesPerSec=6.323895984034452, CurrSamplesPerSec=5.701067683378528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:41,155] [INFO] [timer.py:197:stop] 0/854, RunningAvgSamplesPerSec=6.323866025614896, CurrSamplesPerSec=5.680294398450153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:52,629] [INFO] [timer.py:197:stop] 0/856, RunningAvgSamplesPerSec=6.323646641240253, CurrSamplesPerSec=5.551309222463983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:03,977] [INFO] [timer.py:197:stop] 0/858, RunningAvgSamplesPerSec=6.323661475227139, CurrSamplesPerSec=5.7159530384475685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:15,298] [INFO] [logging.py:68:log_dist] [Rank 0] step=430, skipped=5, lr=[9.738488852516646e-06], mom=[[0.9, 0.999]] [2022-12-16 21:02:15,300] [INFO] [timer.py:197:stop] 0/860, RunningAvgSamplesPerSec=6.323710149257949, CurrSamplesPerSec=5.719581661471569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:26,710] [INFO] [timer.py:197:stop] 0/862, RunningAvgSamplesPerSec=6.323634804344302, CurrSamplesPerSec=5.62526862139189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:38,037] [INFO] [timer.py:197:stop] 0/864, RunningAvgSamplesPerSec=6.323649281830787, CurrSamplesPerSec=5.681464178147183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:49,370] [INFO] [timer.py:197:stop] 0/866, RunningAvgSamplesPerSec=6.3236838467673975, CurrSamplesPerSec=5.705553350483724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:00,794] [INFO] [timer.py:197:stop] 0/868, RunningAvgSamplesPerSec=6.323587287957588, CurrSamplesPerSec=5.591940037322728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:12,130] [INFO] [timer.py:197:stop] 0/870, RunningAvgSamplesPerSec=6.323606237462717, CurrSamplesPerSec=5.7091401213955075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:23,687] [INFO] [timer.py:197:stop] 0/872, RunningAvgSamplesPerSec=6.323648030461905, CurrSamplesPerSec=5.714125248491172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:35,327] [INFO] [timer.py:197:stop] 0/874, RunningAvgSamplesPerSec=6.32362972848686, CurrSamplesPerSec=5.665148674893301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:46,681] [INFO] [timer.py:197:stop] 0/876, RunningAvgSamplesPerSec=6.323635151245853, CurrSamplesPerSec=5.690080062248546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:58,094] [INFO] [timer.py:197:stop] 0/878, RunningAvgSamplesPerSec=6.323670798847649, CurrSamplesPerSec=5.71115840184258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:09,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=440, skipped=5, lr=[9.775911746761854e-06], mom=[[0.9, 0.999]] [2022-12-16 21:04:09,746] [INFO] [timer.py:197:stop] 0/880, RunningAvgSamplesPerSec=6.323698927835154, CurrSamplesPerSec=5.699957111577041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:21,036] [INFO] [timer.py:197:stop] 0/882, RunningAvgSamplesPerSec=6.323752552261068, CurrSamplesPerSec=5.719672819971721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:32,626] [INFO] [timer.py:197:stop] 0/884, RunningAvgSamplesPerSec=6.323770566065425, CurrSamplesPerSec=5.690295486589126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:44,097] [INFO] [timer.py:197:stop] 0/886, RunningAvgSamplesPerSec=6.323742611986993, CurrSamplesPerSec=5.6883239854983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:55,411] [INFO] [timer.py:197:stop] 0/888, RunningAvgSamplesPerSec=6.323806743121909, CurrSamplesPerSec=5.707230045830403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:06,778] [INFO] [timer.py:197:stop] 0/890, RunningAvgSamplesPerSec=6.323798891507006, CurrSamplesPerSec=5.698139293801706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:18,153] [INFO] [timer.py:197:stop] 0/892, RunningAvgSamplesPerSec=6.323797970601452, CurrSamplesPerSec=5.689324155599842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:29,477] [INFO] [timer.py:197:stop] 0/894, RunningAvgSamplesPerSec=6.323804280166466, CurrSamplesPerSec=5.693873238627004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:40,811] [INFO] [timer.py:197:stop] 0/896, RunningAvgSamplesPerSec=6.32381869024255, CurrSamplesPerSec=5.687272353769752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:52,541] [INFO] [timer.py:197:stop] 0/898, RunningAvgSamplesPerSec=6.323304078211752, CurrSamplesPerSec=5.66975376791339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:03,859] [INFO] [logging.py:68:log_dist] [Rank 0] step=450, skipped=5, lr=[9.812484046603779e-06], mom=[[0.9, 0.999]] [2022-12-16 21:06:03,860] [INFO] [timer.py:197:stop] 0/900, RunningAvgSamplesPerSec=6.323361252627297, CurrSamplesPerSec=5.696158486770488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0662, 'learning_rate': 9.812484046603779e-06, 'epoch': 1.91} [2022-12-16 21:06:15,202] [INFO] [timer.py:197:stop] 0/902, RunningAvgSamplesPerSec=6.323392772084618, CurrSamplesPerSec=5.693829035456161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:26,792] [INFO] [timer.py:197:stop] 0/904, RunningAvgSamplesPerSec=6.323328765368771, CurrSamplesPerSec=5.675209477210574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:38,301] [INFO] [timer.py:197:stop] 0/906, RunningAvgSamplesPerSec=6.323365724637102, CurrSamplesPerSec=5.71127262260955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:49,669] [INFO] [timer.py:197:stop] 0/908, RunningAvgSamplesPerSec=6.323331986885105, CurrSamplesPerSec=5.659760969514658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:01,224] [INFO] [timer.py:197:stop] 0/910, RunningAvgSamplesPerSec=6.323368369110442, CurrSamplesPerSec=5.691273423843159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:12,732] [INFO] [timer.py:197:stop] 0/912, RunningAvgSamplesPerSec=6.3233877302253045, CurrSamplesPerSec=5.696535631113576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:24,152] [INFO] [timer.py:197:stop] 0/914, RunningAvgSamplesPerSec=6.323333379923915, CurrSamplesPerSec=5.6425592209689155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:35,594] [INFO] [timer.py:197:stop] 0/916, RunningAvgSamplesPerSec=6.323348683822525, CurrSamplesPerSec=5.708248043439559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:47,094] [INFO] [timer.py:197:stop] 0/918, RunningAvgSamplesPerSec=6.323396275366316, CurrSamplesPerSec=5.695279160815137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:58,467] [INFO] [logging.py:68:log_dist] [Rank 0] step=460, skipped=5, lr=[9.84824356101363e-06], mom=[[0.9, 0.999]] [2022-12-16 21:07:58,468] [INFO] [timer.py:197:stop] 0/920, RunningAvgSamplesPerSec=6.323380406432734, CurrSamplesPerSec=5.670665239188501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:10,097] [INFO] [timer.py:197:stop] 0/922, RunningAvgSamplesPerSec=6.323363664915916, CurrSamplesPerSec=5.695054176276154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:21,496] [INFO] [timer.py:197:stop] 0/924, RunningAvgSamplesPerSec=6.323303982435558, CurrSamplesPerSec=5.652892698429706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:32,970] [INFO] [timer.py:197:stop] 0/926, RunningAvgSamplesPerSec=6.323196876502637, CurrSamplesPerSec=5.609877765122621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:44,327] [INFO] [timer.py:197:stop] 0/928, RunningAvgSamplesPerSec=6.3232058943138485, CurrSamplesPerSec=5.68537928839975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:55,686] [INFO] [timer.py:197:stop] 0/930, RunningAvgSamplesPerSec=6.323205497567241, CurrSamplesPerSec=5.706007909046854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:07,353] [INFO] [timer.py:197:stop] 0/932, RunningAvgSamplesPerSec=6.322831418899166, CurrSamplesPerSec=5.399923920213184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:18,735] [INFO] [timer.py:197:stop] 0/934, RunningAvgSamplesPerSec=6.32280846623546, CurrSamplesPerSec=5.669265454860022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:30,257] [INFO] [timer.py:197:stop] 0/936, RunningAvgSamplesPerSec=6.322880930092381, CurrSamplesPerSec=5.72270659354622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:41,888] [INFO] [timer.py:197:stop] 0/938, RunningAvgSamplesPerSec=6.322911512653137, CurrSamplesPerSec=5.6971461789274604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:53,289] [INFO] [logging.py:68:log_dist] [Rank 0] step=470, skipped=5, lr=[9.883225632758308e-06], mom=[[0.9, 0.999]] [2022-12-16 21:09:53,292] [INFO] [timer.py:197:stop] 0/940, RunningAvgSamplesPerSec=6.322874284590231, CurrSamplesPerSec=5.65775308872727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:04,647] [INFO] [timer.py:197:stop] 0/942, RunningAvgSamplesPerSec=6.322877183552834, CurrSamplesPerSec=5.6954686352675905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:13,197] [INFO] [timer.py:197:stop] 0/944, RunningAvgSamplesPerSec=6.326150664923261, CurrSamplesPerSec=10.148292026847534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:24,559] [INFO] [timer.py:197:stop] 0/946, RunningAvgSamplesPerSec=6.326127489531616, CurrSamplesPerSec=5.6568171532502465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:35,885] [INFO] [timer.py:197:stop] 0/948, RunningAvgSamplesPerSec=6.3261891603196, CurrSamplesPerSec=5.717331163249521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:47,312] [INFO] [timer.py:197:stop] 0/950, RunningAvgSamplesPerSec=6.32620549424659, CurrSamplesPerSec=5.708362390603286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0615, 'learning_rate': 9.900435550016748e-06, 'epoch': 2.01} [2022-12-16 21:10:58,668] [INFO] [timer.py:197:stop] 0/952, RunningAvgSamplesPerSec=6.326192318553131, CurrSamplesPerSec=5.658891651344986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:10,010] [INFO] [timer.py:197:stop] 0/954, RunningAvgSamplesPerSec=6.326204475079454, CurrSamplesPerSec=5.68568973408932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:21,376] [INFO] [timer.py:197:stop] 0/956, RunningAvgSamplesPerSec=6.326191030150801, CurrSamplesPerSec=5.6838889469060225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:32,728] [INFO] [timer.py:197:stop] 0/958, RunningAvgSamplesPerSec=6.326197330889036, CurrSamplesPerSec=5.6822468649952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:44,292] [INFO] [logging.py:68:log_dist] [Rank 0] step=480, skipped=5, lr=[9.917463348331534e-06], mom=[[0.9, 0.999]] [2022-12-16 21:11:44,298] [INFO] [timer.py:197:stop] 0/960, RunningAvgSamplesPerSec=6.3259327540553665, CurrSamplesPerSec=5.469384964136454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:55,877] [INFO] [timer.py:197:stop] 0/962, RunningAvgSamplesPerSec=6.325935854692364, CurrSamplesPerSec=5.712130639515811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:07,434] [INFO] [timer.py:197:stop] 0/964, RunningAvgSamplesPerSec=6.325921725199111, CurrSamplesPerSec=5.700722626182441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:18,923] [INFO] [timer.py:197:stop] 0/966, RunningAvgSamplesPerSec=6.325759514089763, CurrSamplesPerSec=5.568770311968856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:30,325] [INFO] [timer.py:197:stop] 0/968, RunningAvgSamplesPerSec=6.325740953625998, CurrSamplesPerSec=5.680124922543402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:41,706] [INFO] [timer.py:197:stop] 0/970, RunningAvgSamplesPerSec=6.325739440815068, CurrSamplesPerSec=5.713497922489823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:53,334] [INFO] [timer.py:197:stop] 0/972, RunningAvgSamplesPerSec=6.325375565220704, CurrSamplesPerSec=5.402155811404497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:04,701] [INFO] [timer.py:197:stop] 0/974, RunningAvgSamplesPerSec=6.3253512622784145, CurrSamplesPerSec=5.666073017760117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:16,312] [INFO] [timer.py:197:stop] 0/976, RunningAvgSamplesPerSec=6.325380050460924, CurrSamplesPerSec=5.706268694383452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:28,013] [INFO] [timer.py:197:stop] 0/978, RunningAvgSamplesPerSec=6.325435711327527, CurrSamplesPerSec=5.72600691245085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:39,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=490, skipped=5, lr=[9.950987726012135e-06], mom=[[0.9, 0.999]] [2022-12-16 21:13:39,353] [INFO] [timer.py:197:stop] 0/980, RunningAvgSamplesPerSec=6.325457773977271, CurrSamplesPerSec=5.694198382492763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:50,967] [INFO] [timer.py:197:stop] 0/982, RunningAvgSamplesPerSec=6.325489117445667, CurrSamplesPerSec=5.702591761786533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:02,677] [INFO] [timer.py:197:stop] 0/984, RunningAvgSamplesPerSec=6.325408873295665, CurrSamplesPerSec=5.6763934878966475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:14,003] [INFO] [timer.py:197:stop] 0/986, RunningAvgSamplesPerSec=6.325441453698785, CurrSamplesPerSec=5.705198777174381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:25,351] [INFO] [timer.py:197:stop] 0/988, RunningAvgSamplesPerSec=6.325463222633959, CurrSamplesPerSec=5.6933520230469705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:36,810] [INFO] [timer.py:197:stop] 0/990, RunningAvgSamplesPerSec=6.325347026707095, CurrSamplesPerSec=5.7087785461304295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:48,122] [INFO] [timer.py:197:stop] 0/992, RunningAvgSamplesPerSec=6.3254064143464, CurrSamplesPerSec=5.71673381308937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:59,654] [INFO] [timer.py:197:stop] 0/994, RunningAvgSamplesPerSec=6.325195870913208, CurrSamplesPerSec=5.491621718647289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:11,087] [INFO] [timer.py:197:stop] 0/996, RunningAvgSamplesPerSec=6.325231062084005, CurrSamplesPerSec=5.711203360535411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:22,418] [INFO] [timer.py:197:stop] 0/998, RunningAvgSamplesPerSec=6.3252868808302996, CurrSamplesPerSec=5.712874868002988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:33,989] [INFO] [logging.py:68:log_dist] [Rank 0] step=500, skipped=5, lr=[9.98382788472848e-06], mom=[[0.9, 0.999]] [2022-12-16 21:15:33,991] [INFO] [timer.py:197:stop] 0/1000, RunningAvgSamplesPerSec=6.325035740808991, CurrSamplesPerSec=5.474089503245498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0323, 'learning_rate': 9.98382788472848e-06, 'epoch': 2.12} [2022-12-16 21:15:45,433] [INFO] [timer.py:197:stop] 0/1002, RunningAvgSamplesPerSec=6.325084596844273, CurrSamplesPerSec=5.713662342040977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:56,836] [INFO] [timer.py:197:stop] 0/1004, RunningAvgSamplesPerSec=6.3251158291307386, CurrSamplesPerSec=5.713022229276716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:08,258] [INFO] [timer.py:197:stop] 0/1006, RunningAvgSamplesPerSec=6.325055351476069, CurrSamplesPerSec=5.623066744820601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:19,582] [INFO] [timer.py:197:stop] 0/1008, RunningAvgSamplesPerSec=6.325102609875333, CurrSamplesPerSec=5.707804050857386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:30,974] [INFO] [timer.py:197:stop] 0/1010, RunningAvgSamplesPerSec=6.325160154746798, CurrSamplesPerSec=5.720153278154641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:42,337] [INFO] [timer.py:197:stop] 0/1012, RunningAvgSamplesPerSec=6.32517427631391, CurrSamplesPerSec=5.688713354181847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:53,934] [INFO] [timer.py:197:stop] 0/1014, RunningAvgSamplesPerSec=6.325238765075366, CurrSamplesPerSec=5.72578779867803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:05,268] [INFO] [timer.py:197:stop] 0/1016, RunningAvgSamplesPerSec=6.325275504519772, CurrSamplesPerSec=5.704129988366727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:16,885] [INFO] [timer.py:197:stop] 0/1018, RunningAvgSamplesPerSec=6.324959984011779, CurrSamplesPerSec=5.429061828725207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:28,234] [INFO] [logging.py:68:log_dist] [Rank 0] step=510, skipped=5, lr=[9.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 21:17:28,236] [INFO] [timer.py:197:stop] 0/1020, RunningAvgSamplesPerSec=6.3249907983214415, CurrSamplesPerSec=5.71215640831638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:39,550] [INFO] [timer.py:197:stop] 0/1022, RunningAvgSamplesPerSec=6.325041232963377, CurrSamplesPerSec=5.714046186639779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:51,113] [INFO] [timer.py:197:stop] 0/1024, RunningAvgSamplesPerSec=6.32477933640267, CurrSamplesPerSec=5.468682765885077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:02,491] [INFO] [timer.py:197:stop] 0/1026, RunningAvgSamplesPerSec=6.324800913545075, CurrSamplesPerSec=5.684177323585509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:13,861] [INFO] [timer.py:197:stop] 0/1028, RunningAvgSamplesPerSec=6.324787956456297, CurrSamplesPerSec=5.6712122626343895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:25,298] [INFO] [timer.py:197:stop] 0/1030, RunningAvgSamplesPerSec=6.324702766601665, CurrSamplesPerSec=5.589238563916994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:36,685] [INFO] [timer.py:197:stop] 0/1032, RunningAvgSamplesPerSec=6.324675894419773, CurrSamplesPerSec=5.675779219192053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:48,075] [INFO] [timer.py:197:stop] 0/1034, RunningAvgSamplesPerSec=6.324696787274121, CurrSamplesPerSec=5.694624797553552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:59,489] [INFO] [timer.py:197:stop] 0/1036, RunningAvgSamplesPerSec=6.324615631271562, CurrSamplesPerSec=5.6163487641731304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:10,966] [INFO] [timer.py:197:stop] 0/1038, RunningAvgSamplesPerSec=6.32453266339884, CurrSamplesPerSec=5.621838704111991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:22,518] [INFO] [logging.py:68:log_dist] [Rank 0] step=520, skipped=5, lr=[9.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 21:19:22,519] [INFO] [timer.py:197:stop] 0/1040, RunningAvgSamplesPerSec=6.324513595083433, CurrSamplesPerSec=5.658441467170767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:33,943] [INFO] [timer.py:197:stop] 0/1042, RunningAvgSamplesPerSec=6.324515006176603, CurrSamplesPerSec=5.7085328273627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:45,299] [INFO] [timer.py:197:stop] 0/1044, RunningAvgSamplesPerSec=6.324501904454941, CurrSamplesPerSec=5.677590721690172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:56,828] [INFO] [timer.py:197:stop] 0/1046, RunningAvgSamplesPerSec=6.324538332751554, CurrSamplesPerSec=5.715457708942437, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:08,266] [INFO] [timer.py:197:stop] 0/1048, RunningAvgSamplesPerSec=6.324571637922376, CurrSamplesPerSec=5.7061095518455724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:19,629] [INFO] [timer.py:197:stop] 0/1050, RunningAvgSamplesPerSec=6.324574801815149, CurrSamplesPerSec=5.683213133585038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0333, 'learning_rate': 9.957777777777779e-06, 'epoch': 2.22} [2022-12-16 21:20:30,987] [INFO] [timer.py:197:stop] 0/1052, RunningAvgSamplesPerSec=6.324583864783749, CurrSamplesPerSec=5.699014179541857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:42,360] [INFO] [timer.py:197:stop] 0/1054, RunningAvgSamplesPerSec=6.324593248570674, CurrSamplesPerSec=5.691380817152285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:53,737] [INFO] [timer.py:197:stop] 0/1056, RunningAvgSamplesPerSec=6.32459481413206, CurrSamplesPerSec=5.6858901328649205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:05,096] [INFO] [timer.py:197:stop] 0/1058, RunningAvgSamplesPerSec=6.324578587579753, CurrSamplesPerSec=5.666338298872446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:16,463] [INFO] [logging.py:68:log_dist] [Rank 0] step=530, skipped=5, lr=[9.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 21:21:16,465] [INFO] [timer.py:197:stop] 0/1060, RunningAvgSamplesPerSec=6.324570744039958, CurrSamplesPerSec=5.681243890929784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:27,793] [INFO] [timer.py:197:stop] 0/1062, RunningAvgSamplesPerSec=6.324617494273643, CurrSamplesPerSec=5.72019130873348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:39,228] [INFO] [timer.py:197:stop] 0/1064, RunningAvgSamplesPerSec=6.324530287745588, CurrSamplesPerSec=5.622556998055028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:50,614] [INFO] [timer.py:197:stop] 0/1066, RunningAvgSamplesPerSec=6.324571372955135, CurrSamplesPerSec=5.71121672676978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:02,217] [INFO] [timer.py:197:stop] 0/1068, RunningAvgSamplesPerSec=6.324523452438587, CurrSamplesPerSec=5.70700047614419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:13,577] [INFO] [timer.py:197:stop] 0/1070, RunningAvgSamplesPerSec=6.324533424174116, CurrSamplesPerSec=5.6957316002942715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:25,169] [INFO] [timer.py:197:stop] 0/1072, RunningAvgSamplesPerSec=6.324543786478323, CurrSamplesPerSec=5.690434930076675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:36,671] [INFO] [timer.py:197:stop] 0/1074, RunningAvgSamplesPerSec=6.324471864626404, CurrSamplesPerSec=5.684128937809232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:48,004] [INFO] [timer.py:197:stop] 0/1076, RunningAvgSamplesPerSec=6.324501261828952, CurrSamplesPerSec=5.704121018830548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:59,356] [INFO] [timer.py:197:stop] 0/1078, RunningAvgSamplesPerSec=6.3244863543294665, CurrSamplesPerSec=5.678301472905768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:10,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=540, skipped=5, lr=[9.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 21:23:10,755] [INFO] [timer.py:197:stop] 0/1080, RunningAvgSamplesPerSec=6.324440605807113, CurrSamplesPerSec=5.695234694147735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:22,089] [INFO] [timer.py:197:stop] 0/1082, RunningAvgSamplesPerSec=6.324469069745208, CurrSamplesPerSec=5.685915906328817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:33,429] [INFO] [timer.py:197:stop] 0/1084, RunningAvgSamplesPerSec=6.324481265006702, CurrSamplesPerSec=5.698475086538935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:44,770] [INFO] [timer.py:197:stop] 0/1086, RunningAvgSamplesPerSec=6.3245014505218595, CurrSamplesPerSec=5.715240861964207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:56,096] [INFO] [timer.py:197:stop] 0/1088, RunningAvgSamplesPerSec=6.324539515791824, CurrSamplesPerSec=5.726388508460027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:07,404] [INFO] [timer.py:197:stop] 0/1090, RunningAvgSamplesPerSec=6.324577731653262, CurrSamplesPerSec=5.710771543316395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:19,122] [INFO] [timer.py:197:stop] 0/1092, RunningAvgSamplesPerSec=6.324580708869267, CurrSamplesPerSec=5.683601562199607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:30,678] [INFO] [timer.py:197:stop] 0/1094, RunningAvgSamplesPerSec=6.324609515934535, CurrSamplesPerSec=5.708574831147178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:42,259] [INFO] [timer.py:197:stop] 0/1096, RunningAvgSamplesPerSec=6.32433704712248, CurrSamplesPerSec=5.450139213985013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:53,756] [INFO] [timer.py:197:stop] 0/1098, RunningAvgSamplesPerSec=6.3243561501135, CurrSamplesPerSec=5.696628716087094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:05,322] [INFO] [logging.py:68:log_dist] [Rank 0] step=550, skipped=5, lr=[9.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 21:25:05,323] [INFO] [timer.py:197:stop] 0/1100, RunningAvgSamplesPerSec=6.324366355272438, CurrSamplesPerSec=5.685196504972429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0356, 'learning_rate': 9.902222222222223e-06, 'epoch': 2.33} [2022-12-16 21:25:16,754] [INFO] [timer.py:197:stop] 0/1102, RunningAvgSamplesPerSec=6.324272647895259, CurrSamplesPerSec=5.596251823753526, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:28,285] [INFO] [timer.py:197:stop] 0/1104, RunningAvgSamplesPerSec=6.3242917552540545, CurrSamplesPerSec=5.701883879606733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:39,645] [INFO] [timer.py:197:stop] 0/1106, RunningAvgSamplesPerSec=6.324275104668984, CurrSamplesPerSec=5.686045740695403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:51,097] [INFO] [timer.py:197:stop] 0/1108, RunningAvgSamplesPerSec=6.3241656983143075, CurrSamplesPerSec=5.577740987465537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:02,481] [INFO] [timer.py:197:stop] 0/1110, RunningAvgSamplesPerSec=6.324114079876728, CurrSamplesPerSec=5.626380230354328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:13,901] [INFO] [timer.py:197:stop] 0/1112, RunningAvgSamplesPerSec=6.324040600499698, CurrSamplesPerSec=5.613003653490731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:25,478] [INFO] [timer.py:197:stop] 0/1114, RunningAvgSamplesPerSec=6.323775004653686, CurrSamplesPerSec=5.447233828195022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:36,786] [INFO] [timer.py:197:stop] 0/1116, RunningAvgSamplesPerSec=6.3238317786318134, CurrSamplesPerSec=5.725627077052026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:48,119] [INFO] [timer.py:197:stop] 0/1118, RunningAvgSamplesPerSec=6.323858000238239, CurrSamplesPerSec=5.6966565212796345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:59,437] [INFO] [logging.py:68:log_dist] [Rank 0] step=560, skipped=5, lr=[9.88e-06], mom=[[0.9, 0.999]] [2022-12-16 21:26:59,438] [INFO] [timer.py:197:stop] 0/1120, RunningAvgSamplesPerSec=6.323885458404519, CurrSamplesPerSec=5.6730858326379225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:10,785] [INFO] [timer.py:197:stop] 0/1122, RunningAvgSamplesPerSec=6.323900620387515, CurrSamplesPerSec=5.690677163038752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:22,351] [INFO] [timer.py:197:stop] 0/1124, RunningAvgSamplesPerSec=6.323915903130612, CurrSamplesPerSec=5.696965539863474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:34,000] [INFO] [timer.py:197:stop] 0/1126, RunningAvgSamplesPerSec=6.323864250923531, CurrSamplesPerSec=5.662085075555017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:45,372] [INFO] [timer.py:197:stop] 0/1128, RunningAvgSamplesPerSec=6.323853592054257, CurrSamplesPerSec=5.660074352251678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:56,729] [INFO] [timer.py:197:stop] 0/1130, RunningAvgSamplesPerSec=6.323836945601488, CurrSamplesPerSec=5.681051033395681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:08,190] [INFO] [timer.py:197:stop] 0/1132, RunningAvgSamplesPerSec=6.323840085148342, CurrSamplesPerSec=5.6790268195333224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:19,494] [INFO] [timer.py:197:stop] 0/1134, RunningAvgSamplesPerSec=6.323901462456079, CurrSamplesPerSec=5.720255425533773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:30,806] [INFO] [timer.py:197:stop] 0/1136, RunningAvgSamplesPerSec=6.323955250686382, CurrSamplesPerSec=5.715921149785817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:42,142] [INFO] [timer.py:197:stop] 0/1138, RunningAvgSamplesPerSec=6.323982101549626, CurrSamplesPerSec=5.718288206433784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:53,449] [INFO] [logging.py:68:log_dist] [Rank 0] step=570, skipped=5, lr=[9.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 21:28:53,450] [INFO] [timer.py:197:stop] 0/1140, RunningAvgSamplesPerSec=6.324037655604661, CurrSamplesPerSec=5.70284278480875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:04,783] [INFO] [timer.py:197:stop] 0/1142, RunningAvgSamplesPerSec=6.324054750381438, CurrSamplesPerSec=5.6922710102938465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:16,157] [INFO] [timer.py:197:stop] 0/1144, RunningAvgSamplesPerSec=6.324025693628083, CurrSamplesPerSec=5.708841678873624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:27,507] [INFO] [timer.py:197:stop] 0/1146, RunningAvgSamplesPerSec=6.324022208813268, CurrSamplesPerSec=5.691424016952267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:38,855] [INFO] [timer.py:197:stop] 0/1148, RunningAvgSamplesPerSec=6.32402470004462, CurrSamplesPerSec=5.684709862270304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:50,263] [INFO] [timer.py:197:stop] 0/1150, RunningAvgSamplesPerSec=6.324047398655615, CurrSamplesPerSec=5.70717083158569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0348, 'learning_rate': 9.846666666666668e-06, 'epoch': 2.44} [2022-12-16 21:30:01,599] [INFO] [timer.py:197:stop] 0/1152, RunningAvgSamplesPerSec=6.324083612239293, CurrSamplesPerSec=5.733073456941207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:13,060] [INFO] [timer.py:197:stop] 0/1154, RunningAvgSamplesPerSec=6.32398511164039, CurrSamplesPerSec=5.608629462518138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:24,764] [INFO] [timer.py:197:stop] 0/1156, RunningAvgSamplesPerSec=6.32395490744807, CurrSamplesPerSec=5.665791976052485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:36,281] [INFO] [timer.py:197:stop] 0/1158, RunningAvgSamplesPerSec=6.323960705938503, CurrSamplesPerSec=5.7071327312664195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:47,832] [INFO] [logging.py:68:log_dist] [Rank 0] step=580, skipped=5, lr=[9.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 21:30:47,832] [INFO] [timer.py:197:stop] 0/1160, RunningAvgSamplesPerSec=6.323719233865647, CurrSamplesPerSec=5.472872328800012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:59,391] [INFO] [timer.py:197:stop] 0/1162, RunningAvgSamplesPerSec=6.32370184523637, CurrSamplesPerSec=5.674197471333234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:10,850] [INFO] [timer.py:197:stop] 0/1164, RunningAvgSamplesPerSec=6.323718795363503, CurrSamplesPerSec=5.697172538195727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:22,211] [INFO] [timer.py:197:stop] 0/1166, RunningAvgSamplesPerSec=6.3237189852523725, CurrSamplesPerSec=5.701931114731109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:33,559] [INFO] [timer.py:197:stop] 0/1168, RunningAvgSamplesPerSec=6.323734374802737, CurrSamplesPerSec=5.700105501726921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:44,892] [INFO] [timer.py:197:stop] 0/1170, RunningAvgSamplesPerSec=6.323763582374604, CurrSamplesPerSec=5.704374116034424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:56,244] [INFO] [timer.py:197:stop] 0/1172, RunningAvgSamplesPerSec=6.32377243788173, CurrSamplesPerSec=5.694551589747411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:07,599] [INFO] [timer.py:197:stop] 0/1174, RunningAvgSamplesPerSec=6.323779297278022, CurrSamplesPerSec=5.6859893740268745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:18,933] [INFO] [timer.py:197:stop] 0/1176, RunningAvgSamplesPerSec=6.323790989360503, CurrSamplesPerSec=5.69161975163729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:30,268] [INFO] [timer.py:197:stop] 0/1178, RunningAvgSamplesPerSec=6.323817303486766, CurrSamplesPerSec=5.726107314664906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:41,713] [INFO] [logging.py:68:log_dist] [Rank 0] step=590, skipped=5, lr=[9.813333333333333e-06], mom=[[0.9, 0.999]] [2022-12-16 21:32:41,714] [INFO] [timer.py:197:stop] 0/1180, RunningAvgSamplesPerSec=6.323843584883303, CurrSamplesPerSec=5.706581425626131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:53,026] [INFO] [timer.py:197:stop] 0/1182, RunningAvgSamplesPerSec=6.3238942954570305, CurrSamplesPerSec=5.729338187061064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:04,406] [INFO] [timer.py:197:stop] 0/1184, RunningAvgSamplesPerSec=6.323874179833749, CurrSamplesPerSec=5.657513889005977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:15,727] [INFO] [timer.py:197:stop] 0/1186, RunningAvgSamplesPerSec=6.323901512870132, CurrSamplesPerSec=5.709603024005327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:27,073] [INFO] [timer.py:197:stop] 0/1188, RunningAvgSamplesPerSec=6.323900985123022, CurrSamplesPerSec=5.709132107478294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:38,449] [INFO] [timer.py:197:stop] 0/1190, RunningAvgSamplesPerSec=6.323886756213642, CurrSamplesPerSec=5.694773393442737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:49,761] [INFO] [timer.py:197:stop] 0/1192, RunningAvgSamplesPerSec=6.323938229239878, CurrSamplesPerSec=5.711202145426297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:01,125] [INFO] [timer.py:197:stop] 0/1194, RunningAvgSamplesPerSec=6.323934641391922, CurrSamplesPerSec=5.702443242161101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:12,448] [INFO] [timer.py:197:stop] 0/1196, RunningAvgSamplesPerSec=6.323959876909366, CurrSamplesPerSec=5.703602045869347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:23,812] [INFO] [timer.py:197:stop] 0/1198, RunningAvgSamplesPerSec=6.323975224149886, CurrSamplesPerSec=5.688260341444645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:35,164] [INFO] [logging.py:68:log_dist] [Rank 0] step=600, skipped=5, lr=[9.791111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 21:34:35,166] [INFO] [timer.py:197:stop] 0/1200, RunningAvgSamplesPerSec=6.323984429999333, CurrSamplesPerSec=5.689274958682965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.034, 'learning_rate': 9.791111111111112e-06, 'epoch': 2.54} [2022-12-16 21:34:46,518] [INFO] [timer.py:197:stop] 0/1202, RunningAvgSamplesPerSec=6.323997457450842, CurrSamplesPerSec=5.695443016811219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:58,030] [INFO] [timer.py:197:stop] 0/1204, RunningAvgSamplesPerSec=6.324019030245103, CurrSamplesPerSec=5.679552865802275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:09,353] [INFO] [timer.py:197:stop] 0/1206, RunningAvgSamplesPerSec=6.324070278143984, CurrSamplesPerSec=5.712041909103797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:20,728] [INFO] [timer.py:197:stop] 0/1208, RunningAvgSamplesPerSec=6.324057885317574, CurrSamplesPerSec=5.679879741258521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:32,055] [INFO] [timer.py:197:stop] 0/1210, RunningAvgSamplesPerSec=6.324093491416535, CurrSamplesPerSec=5.708959449453373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:43,396] [INFO] [timer.py:197:stop] 0/1212, RunningAvgSamplesPerSec=6.324121293486445, CurrSamplesPerSec=5.702608722105414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:54,754] [INFO] [timer.py:197:stop] 0/1214, RunningAvgSamplesPerSec=6.32410811693974, CurrSamplesPerSec=5.677839068060557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:06,166] [INFO] [timer.py:197:stop] 0/1216, RunningAvgSamplesPerSec=6.324056865254654, CurrSamplesPerSec=5.649525567777039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:17,527] [INFO] [timer.py:197:stop] 0/1218, RunningAvgSamplesPerSec=6.324057753210374, CurrSamplesPerSec=5.6874906986638285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:28,891] [INFO] [logging.py:68:log_dist] [Rank 0] step=610, skipped=5, lr=[9.76888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 21:36:28,892] [INFO] [timer.py:197:stop] 0/1220, RunningAvgSamplesPerSec=6.324054379245348, CurrSamplesPerSec=5.6906122599537055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:40,250] [INFO] [timer.py:197:stop] 0/1222, RunningAvgSamplesPerSec=6.324061368820376, CurrSamplesPerSec=5.700368410913226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:51,599] [INFO] [timer.py:197:stop] 0/1224, RunningAvgSamplesPerSec=6.324082612140597, CurrSamplesPerSec=5.709710138608091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:02,945] [INFO] [timer.py:197:stop] 0/1226, RunningAvgSamplesPerSec=6.324099967940003, CurrSamplesPerSec=5.710087379174428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:14,312] [INFO] [timer.py:197:stop] 0/1228, RunningAvgSamplesPerSec=6.324095213948072, CurrSamplesPerSec=5.6951648538816695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:25,643] [INFO] [timer.py:197:stop] 0/1230, RunningAvgSamplesPerSec=6.3241268422499015, CurrSamplesPerSec=5.732377573527465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:36,998] [INFO] [timer.py:197:stop] 0/1232, RunningAvgSamplesPerSec=6.32411884613214, CurrSamplesPerSec=5.675400258437628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:48,363] [INFO] [timer.py:197:stop] 0/1234, RunningAvgSamplesPerSec=6.3241130252354365, CurrSamplesPerSec=5.674158610594783, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:59,689] [INFO] [timer.py:197:stop] 0/1236, RunningAvgSamplesPerSec=6.324133915811186, CurrSamplesPerSec=5.713030983651014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:11,060] [INFO] [timer.py:197:stop] 0/1238, RunningAvgSamplesPerSec=6.324125635080051, CurrSamplesPerSec=5.683048536732076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:22,399] [INFO] [logging.py:68:log_dist] [Rank 0] step=620, skipped=5, lr=[9.746666666666668e-06], mom=[[0.9, 0.999]] [2022-12-16 21:38:22,401] [INFO] [timer.py:197:stop] 0/1240, RunningAvgSamplesPerSec=6.3241300316642866, CurrSamplesPerSec=5.6885821924612765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:33,727] [INFO] [timer.py:197:stop] 0/1242, RunningAvgSamplesPerSec=6.324153194541566, CurrSamplesPerSec=5.721168331625024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:45,078] [INFO] [timer.py:197:stop] 0/1244, RunningAvgSamplesPerSec=6.324166885839952, CurrSamplesPerSec=5.6995364324043365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:56,402] [INFO] [timer.py:197:stop] 0/1246, RunningAvgSamplesPerSec=6.324190757241607, CurrSamplesPerSec=5.701532184211459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:07,779] [INFO] [timer.py:197:stop] 0/1248, RunningAvgSamplesPerSec=6.324176492543799, CurrSamplesPerSec=5.685249002830177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:19,130] [INFO] [timer.py:197:stop] 0/1250, RunningAvgSamplesPerSec=6.3241874762463635, CurrSamplesPerSec=5.705175011182494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0339, 'learning_rate': 9.735555555555556e-06, 'epoch': 2.65} [2022-12-16 21:39:30,528] [INFO] [timer.py:197:stop] 0/1252, RunningAvgSamplesPerSec=6.324153428015963, CurrSamplesPerSec=5.670633614253161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:41,877] [INFO] [timer.py:197:stop] 0/1254, RunningAvgSamplesPerSec=6.324166879634101, CurrSamplesPerSec=5.691319276670348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:53,239] [INFO] [timer.py:197:stop] 0/1256, RunningAvgSamplesPerSec=6.324167247062276, CurrSamplesPerSec=5.681665722246253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:04,580] [INFO] [timer.py:197:stop] 0/1258, RunningAvgSamplesPerSec=6.324174890989224, CurrSamplesPerSec=5.681159003145008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:15,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=630, skipped=5, lr=[9.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 21:40:15,936] [INFO] [timer.py:197:stop] 0/1260, RunningAvgSamplesPerSec=6.324179951438752, CurrSamplesPerSec=5.6964870347127015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:27,272] [INFO] [timer.py:197:stop] 0/1262, RunningAvgSamplesPerSec=6.324214385657734, CurrSamplesPerSec=5.709893530081676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:38,622] [INFO] [timer.py:197:stop] 0/1264, RunningAvgSamplesPerSec=6.324239302104727, CurrSamplesPerSec=5.69377396361336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:49,947] [INFO] [timer.py:197:stop] 0/1266, RunningAvgSamplesPerSec=6.324277602074355, CurrSamplesPerSec=5.720856194199715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:01,249] [INFO] [timer.py:197:stop] 0/1268, RunningAvgSamplesPerSec=6.324335850701666, CurrSamplesPerSec=5.734853355839019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:12,564] [INFO] [timer.py:197:stop] 0/1270, RunningAvgSamplesPerSec=6.324382124670813, CurrSamplesPerSec=5.71265311090979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:23,900] [INFO] [timer.py:197:stop] 0/1272, RunningAvgSamplesPerSec=6.3243761836890755, CurrSamplesPerSec=5.69152514083604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:35,201] [INFO] [timer.py:197:stop] 0/1274, RunningAvgSamplesPerSec=6.324422547556908, CurrSamplesPerSec=5.7167786161089635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:46,505] [INFO] [timer.py:197:stop] 0/1276, RunningAvgSamplesPerSec=6.324480723648808, CurrSamplesPerSec=5.724640229146444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:57,832] [INFO] [timer.py:197:stop] 0/1278, RunningAvgSamplesPerSec=6.324515620440186, CurrSamplesPerSec=5.7136565044984415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:09,174] [INFO] [logging.py:68:log_dist] [Rank 0] step=640, skipped=5, lr=[9.702222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 21:42:09,176] [INFO] [timer.py:197:stop] 0/1280, RunningAvgSamplesPerSec=6.32453233632449, CurrSamplesPerSec=5.711460003166846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:20,519] [INFO] [timer.py:197:stop] 0/1282, RunningAvgSamplesPerSec=6.324589203102045, CurrSamplesPerSec=5.729852069808556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:31,881] [INFO] [timer.py:197:stop] 0/1284, RunningAvgSamplesPerSec=6.324574959448107, CurrSamplesPerSec=5.683294472976667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:43,238] [INFO] [timer.py:197:stop] 0/1286, RunningAvgSamplesPerSec=6.324582465903206, CurrSamplesPerSec=5.690985533237665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:54,569] [INFO] [timer.py:197:stop] 0/1288, RunningAvgSamplesPerSec=6.3246143105854, CurrSamplesPerSec=5.701603391759544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:06,039] [INFO] [timer.py:197:stop] 0/1290, RunningAvgSamplesPerSec=6.3245165663862775, CurrSamplesPerSec=5.573252155643943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:17,387] [INFO] [timer.py:197:stop] 0/1292, RunningAvgSamplesPerSec=6.324514180927693, CurrSamplesPerSec=5.684713233086082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:28,763] [INFO] [timer.py:197:stop] 0/1294, RunningAvgSamplesPerSec=6.324465620148037, CurrSamplesPerSec=5.665202955258279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:40,109] [INFO] [timer.py:197:stop] 0/1296, RunningAvgSamplesPerSec=6.324476536227191, CurrSamplesPerSec=5.703485950413083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:51,455] [INFO] [timer.py:197:stop] 0/1298, RunningAvgSamplesPerSec=6.32448733334034, CurrSamplesPerSec=5.699092583967394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:02,827] [INFO] [logging.py:68:log_dist] [Rank 0] step=650, skipped=5, lr=[9.68e-06], mom=[[0.9, 0.999]] [2022-12-16 21:44:02,829] [INFO] [timer.py:197:stop] 0/1300, RunningAvgSamplesPerSec=6.324472640902777, CurrSamplesPerSec=5.674295345181151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0354, 'learning_rate': 9.68e-06, 'epoch': 2.75} [2022-12-16 21:44:14,174] [INFO] [timer.py:197:stop] 0/1302, RunningAvgSamplesPerSec=6.324469450338527, CurrSamplesPerSec=5.7018911464980055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:25,494] [INFO] [timer.py:197:stop] 0/1304, RunningAvgSamplesPerSec=6.324474296992287, CurrSamplesPerSec=5.684456580977391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:36,890] [INFO] [timer.py:197:stop] 0/1306, RunningAvgSamplesPerSec=6.324445030815587, CurrSamplesPerSec=5.6593521681943075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:48,254] [INFO] [timer.py:197:stop] 0/1308, RunningAvgSamplesPerSec=6.3244409931410575, CurrSamplesPerSec=5.66902528072435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:59,617] [INFO] [timer.py:197:stop] 0/1310, RunningAvgSamplesPerSec=6.324437449468802, CurrSamplesPerSec=5.692400893631308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:10,999] [INFO] [timer.py:197:stop] 0/1312, RunningAvgSamplesPerSec=6.324450667657297, CurrSamplesPerSec=5.703528849436146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:22,350] [INFO] [timer.py:197:stop] 0/1314, RunningAvgSamplesPerSec=6.324459340859516, CurrSamplesPerSec=5.700481716272428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:33,715] [INFO] [timer.py:197:stop] 0/1316, RunningAvgSamplesPerSec=6.324453866475892, CurrSamplesPerSec=5.684826157724787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:45,083] [INFO] [timer.py:197:stop] 0/1318, RunningAvgSamplesPerSec=6.324431190376394, CurrSamplesPerSec=5.692041676866035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:56,680] [INFO] [logging.py:68:log_dist] [Rank 0] step=660, skipped=5, lr=[9.657777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 21:45:56,682] [INFO] [timer.py:197:stop] 0/1320, RunningAvgSamplesPerSec=6.32442696063329, CurrSamplesPerSec=5.687795349133654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:08,055] [INFO] [timer.py:197:stop] 0/1322, RunningAvgSamplesPerSec=6.324400513545528, CurrSamplesPerSec=5.6830059452395725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:19,405] [INFO] [timer.py:197:stop] 0/1324, RunningAvgSamplesPerSec=6.324409371583161, CurrSamplesPerSec=5.70627475944235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:30,768] [INFO] [timer.py:197:stop] 0/1326, RunningAvgSamplesPerSec=6.3243939378709815, CurrSamplesPerSec=5.6951795951100905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:42,121] [INFO] [timer.py:197:stop] 0/1328, RunningAvgSamplesPerSec=6.324402324882206, CurrSamplesPerSec=5.699022649051794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:53,470] [INFO] [timer.py:197:stop] 0/1330, RunningAvgSamplesPerSec=6.324400154231076, CurrSamplesPerSec=5.6892438493070046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:04,850] [INFO] [timer.py:197:stop] 0/1332, RunningAvgSamplesPerSec=6.324392944136967, CurrSamplesPerSec=5.685559434291239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:16,202] [INFO] [timer.py:197:stop] 0/1334, RunningAvgSamplesPerSec=6.3243919800862995, CurrSamplesPerSec=5.687262473216299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:27,578] [INFO] [timer.py:197:stop] 0/1336, RunningAvgSamplesPerSec=6.3243769326191455, CurrSamplesPerSec=5.686703434412646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:39,062] [INFO] [timer.py:197:stop] 0/1338, RunningAvgSamplesPerSec=6.324378950957167, CurrSamplesPerSec=5.691266183990615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:50,436] [INFO] [logging.py:68:log_dist] [Rank 0] step=670, skipped=5, lr=[9.635555555555557e-06], mom=[[0.9, 0.999]] [2022-12-16 21:47:50,438] [INFO] [timer.py:197:stop] 0/1340, RunningAvgSamplesPerSec=6.324378271402667, CurrSamplesPerSec=5.691395056120047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:01,811] [INFO] [timer.py:197:stop] 0/1342, RunningAvgSamplesPerSec=6.324364127126357, CurrSamplesPerSec=5.663703573734233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:13,192] [INFO] [timer.py:197:stop] 0/1344, RunningAvgSamplesPerSec=6.324348771990201, CurrSamplesPerSec=5.698139777624442, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:24,528] [INFO] [timer.py:197:stop] 0/1346, RunningAvgSamplesPerSec=6.324374215372536, CurrSamplesPerSec=5.702518106714946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:35,895] [INFO] [timer.py:197:stop] 0/1348, RunningAvgSamplesPerSec=6.3243590936320375, CurrSamplesPerSec=5.687893692645465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:47,295] [INFO] [timer.py:197:stop] 0/1350, RunningAvgSamplesPerSec=6.324326854499699, CurrSamplesPerSec=5.655780951123609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0374, 'learning_rate': 9.624444444444445e-06, 'epoch': 2.86} [2022-12-16 21:48:58,904] [INFO] [timer.py:197:stop] 0/1352, RunningAvgSamplesPerSec=6.324318267548198, CurrSamplesPerSec=5.6796292936049895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:10,273] [INFO] [timer.py:197:stop] 0/1354, RunningAvgSamplesPerSec=6.324296695879865, CurrSamplesPerSec=5.667203924328584, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:21,654] [INFO] [timer.py:197:stop] 0/1356, RunningAvgSamplesPerSec=6.32428018514822, CurrSamplesPerSec=5.6751492458016735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:33,006] [INFO] [timer.py:197:stop] 0/1358, RunningAvgSamplesPerSec=6.324275196063561, CurrSamplesPerSec=5.69712223815206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:44,386] [INFO] [logging.py:68:log_dist] [Rank 0] step=680, skipped=5, lr=[9.613333333333335e-06], mom=[[0.9, 0.999]] [2022-12-16 21:49:44,388] [INFO] [timer.py:197:stop] 0/1360, RunningAvgSamplesPerSec=6.324255960679552, CurrSamplesPerSec=5.670191860369816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:55,738] [INFO] [timer.py:197:stop] 0/1362, RunningAvgSamplesPerSec=6.324254309513514, CurrSamplesPerSec=5.6942959812107965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:07,114] [INFO] [timer.py:197:stop] 0/1364, RunningAvgSamplesPerSec=6.324256979747847, CurrSamplesPerSec=5.6949874818050725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:18,496] [INFO] [timer.py:197:stop] 0/1366, RunningAvgSamplesPerSec=6.324236172921695, CurrSamplesPerSec=5.687761122532687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:29,947] [INFO] [timer.py:197:stop] 0/1368, RunningAvgSamplesPerSec=6.3242387362212655, CurrSamplesPerSec=5.692255077058373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:41,282] [INFO] [timer.py:197:stop] 0/1370, RunningAvgSamplesPerSec=6.324249802937796, CurrSamplesPerSec=5.703811950695486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:52,601] [INFO] [timer.py:197:stop] 0/1372, RunningAvgSamplesPerSec=6.32427068603822, CurrSamplesPerSec=5.704097261816972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:03,926] [INFO] [timer.py:197:stop] 0/1374, RunningAvgSamplesPerSec=6.3242886615535285, CurrSamplesPerSec=5.698351457952698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:15,332] [INFO] [timer.py:197:stop] 0/1376, RunningAvgSamplesPerSec=6.324278565827469, CurrSamplesPerSec=5.688882378479673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:26,682] [INFO] [timer.py:197:stop] 0/1378, RunningAvgSamplesPerSec=6.324272743331345, CurrSamplesPerSec=5.677133235207755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:38,045] [INFO] [logging.py:68:log_dist] [Rank 0] step=690, skipped=5, lr=[9.591111111111113e-06], mom=[[0.9, 0.999]] [2022-12-16 21:51:38,047] [INFO] [timer.py:197:stop] 0/1380, RunningAvgSamplesPerSec=6.324266336132586, CurrSamplesPerSec=5.672529097773243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:49,338] [INFO] [timer.py:197:stop] 0/1382, RunningAvgSamplesPerSec=6.3242997387905024, CurrSamplesPerSec=5.70745550803231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:00,686] [INFO] [timer.py:197:stop] 0/1384, RunningAvgSamplesPerSec=6.324323307663187, CurrSamplesPerSec=5.692134373626455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:12,025] [INFO] [timer.py:197:stop] 0/1386, RunningAvgSamplesPerSec=6.324327041155843, CurrSamplesPerSec=5.6880930415204345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:23,323] [INFO] [timer.py:197:stop] 0/1388, RunningAvgSamplesPerSec=6.324380802604177, CurrSamplesPerSec=5.735462831952154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:34,646] [INFO] [timer.py:197:stop] 0/1390, RunningAvgSamplesPerSec=6.324399578264787, CurrSamplesPerSec=5.719052804513825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:45,967] [INFO] [timer.py:197:stop] 0/1392, RunningAvgSamplesPerSec=6.3244345536156, CurrSamplesPerSec=5.704161503176972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:57,335] [INFO] [timer.py:197:stop] 0/1394, RunningAvgSamplesPerSec=6.32447811176164, CurrSamplesPerSec=5.718143983956547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:08,631] [INFO] [timer.py:197:stop] 0/1396, RunningAvgSamplesPerSec=6.324519518597146, CurrSamplesPerSec=5.714886786951273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:19,938] [INFO] [timer.py:197:stop] 0/1398, RunningAvgSamplesPerSec=6.32455268106817, CurrSamplesPerSec=5.7012866044767625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:31,265] [INFO] [logging.py:68:log_dist] [Rank 0] step=700, skipped=5, lr=[9.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 21:53:31,267] [INFO] [timer.py:197:stop] 0/1400, RunningAvgSamplesPerSec=6.324579534160159, CurrSamplesPerSec=5.692990030015316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0371, 'learning_rate': 9.56888888888889e-06, 'epoch': 2.97} [2022-12-16 21:53:42,558] [INFO] [timer.py:197:stop] 0/1402, RunningAvgSamplesPerSec=6.32462746208894, CurrSamplesPerSec=5.7200996461464815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:53,895] [INFO] [timer.py:197:stop] 0/1404, RunningAvgSamplesPerSec=6.324632008464894, CurrSamplesPerSec=5.690293315377784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:05,179] [INFO] [timer.py:197:stop] 0/1406, RunningAvgSamplesPerSec=6.324698640337143, CurrSamplesPerSec=5.726335492400019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:16,499] [INFO] [timer.py:197:stop] 0/1408, RunningAvgSamplesPerSec=6.324731919832654, CurrSamplesPerSec=5.702019773539062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:27,822] [INFO] [timer.py:197:stop] 0/1410, RunningAvgSamplesPerSec=6.3247630088837274, CurrSamplesPerSec=5.724249100746272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:39,170] [INFO] [timer.py:197:stop] 0/1412, RunningAvgSamplesPerSec=6.3247716831156175, CurrSamplesPerSec=5.682469154661037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:50,513] [INFO] [timer.py:197:stop] 0/1414, RunningAvgSamplesPerSec=6.32478804755692, CurrSamplesPerSec=5.704216776146671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:59,057] [INFO] [timer.py:197:stop] 0/1416, RunningAvgSamplesPerSec=6.32697491966737, CurrSamplesPerSec=10.183929802384258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:10,405] [INFO] [timer.py:197:stop] 0/1418, RunningAvgSamplesPerSec=6.326984384492439, CurrSamplesPerSec=5.698068656563964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:21,716] [INFO] [logging.py:68:log_dist] [Rank 0] step=710, skipped=5, lr=[9.546666666666668e-06], mom=[[0.9, 0.999]] [2022-12-16 21:55:21,718] [INFO] [timer.py:197:stop] 0/1420, RunningAvgSamplesPerSec=6.327008366408901, CurrSamplesPerSec=5.722761494483829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:33,025] [INFO] [timer.py:197:stop] 0/1422, RunningAvgSamplesPerSec=6.32703881467068, CurrSamplesPerSec=5.725638068365156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:44,329] [INFO] [timer.py:197:stop] 0/1424, RunningAvgSamplesPerSec=6.3270593471696595, CurrSamplesPerSec=5.7123326634216784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:55,642] [INFO] [timer.py:197:stop] 0/1426, RunningAvgSamplesPerSec=6.327096793945078, CurrSamplesPerSec=5.710449849131292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:06,990] [INFO] [timer.py:197:stop] 0/1428, RunningAvgSamplesPerSec=6.327105470684101, CurrSamplesPerSec=5.6794182809613725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:18,294] [INFO] [timer.py:197:stop] 0/1430, RunningAvgSamplesPerSec=6.3271388039928596, CurrSamplesPerSec=5.714867320181734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:29,628] [INFO] [timer.py:197:stop] 0/1432, RunningAvgSamplesPerSec=6.3271327648059374, CurrSamplesPerSec=5.6914160526940245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:40,960] [INFO] [timer.py:197:stop] 0/1434, RunningAvgSamplesPerSec=6.327141902219056, CurrSamplesPerSec=5.689690747652517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:52,302] [INFO] [timer.py:197:stop] 0/1436, RunningAvgSamplesPerSec=6.327156027945619, CurrSamplesPerSec=5.689736092584112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:03,628] [INFO] [timer.py:197:stop] 0/1438, RunningAvgSamplesPerSec=6.3271860924201, CurrSamplesPerSec=5.701005206768365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:14,933] [INFO] [logging.py:68:log_dist] [Rank 0] step=720, skipped=5, lr=[9.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 21:57:14,935] [INFO] [timer.py:197:stop] 0/1440, RunningAvgSamplesPerSec=6.327240541450005, CurrSamplesPerSec=5.733791309617626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:26,263] [INFO] [timer.py:197:stop] 0/1442, RunningAvgSamplesPerSec=6.327265439071706, CurrSamplesPerSec=5.709909562214138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:37,604] [INFO] [timer.py:197:stop] 0/1444, RunningAvgSamplesPerSec=6.327289199225549, CurrSamplesPerSec=5.7069866443111605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:49,080] [INFO] [timer.py:197:stop] 0/1446, RunningAvgSamplesPerSec=6.327306402988852, CurrSamplesPerSec=5.691615407194863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:00,419] [INFO] [timer.py:197:stop] 0/1448, RunningAvgSamplesPerSec=6.327318797851507, CurrSamplesPerSec=5.702858777371699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:11,758] [INFO] [timer.py:197:stop] 0/1450, RunningAvgSamplesPerSec=6.3273351798091335, CurrSamplesPerSec=5.695963648425355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0227, 'learning_rate': 9.513333333333334e-06, 'epoch': 3.07} [2022-12-16 21:58:23,085] [INFO] [timer.py:197:stop] 0/1452, RunningAvgSamplesPerSec=6.32736439156718, CurrSamplesPerSec=5.69986222339413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:34,423] [INFO] [timer.py:197:stop] 0/1454, RunningAvgSamplesPerSec=6.327382444473141, CurrSamplesPerSec=5.691851947598792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:45,728] [INFO] [timer.py:197:stop] 0/1456, RunningAvgSamplesPerSec=6.327411355372915, CurrSamplesPerSec=5.7264754862424025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:57,189] [INFO] [timer.py:197:stop] 0/1458, RunningAvgSamplesPerSec=6.327422689770682, CurrSamplesPerSec=5.687144391209955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:08,503] [INFO] [logging.py:68:log_dist] [Rank 0] step=730, skipped=5, lr=[9.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 21:59:08,505] [INFO] [timer.py:197:stop] 0/1460, RunningAvgSamplesPerSec=6.327457442468501, CurrSamplesPerSec=5.7197886003305705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:19,811] [INFO] [timer.py:197:stop] 0/1462, RunningAvgSamplesPerSec=6.327485371631291, CurrSamplesPerSec=5.720289800531883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:31,139] [INFO] [timer.py:197:stop] 0/1464, RunningAvgSamplesPerSec=6.327510730275065, CurrSamplesPerSec=5.717306565446505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:42,470] [INFO] [timer.py:197:stop] 0/1466, RunningAvgSamplesPerSec=6.327533540895815, CurrSamplesPerSec=5.704577773769327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:53,873] [INFO] [timer.py:197:stop] 0/1468, RunningAvgSamplesPerSec=6.327495467322438, CurrSamplesPerSec=5.645587947967342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:05,302] [INFO] [timer.py:197:stop] 0/1470, RunningAvgSamplesPerSec=6.327436755212707, CurrSamplesPerSec=5.624859129929112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:16,701] [INFO] [timer.py:197:stop] 0/1472, RunningAvgSamplesPerSec=6.3274364325893995, CurrSamplesPerSec=5.687119088560816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:28,151] [INFO] [timer.py:197:stop] 0/1474, RunningAvgSamplesPerSec=6.327424697035128, CurrSamplesPerSec=5.67494696401265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:39,471] [INFO] [timer.py:197:stop] 0/1476, RunningAvgSamplesPerSec=6.327443763191702, CurrSamplesPerSec=5.70665906786341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:50,872] [INFO] [timer.py:197:stop] 0/1478, RunningAvgSamplesPerSec=6.327434800015111, CurrSamplesPerSec=5.677509785642671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:02,243] [INFO] [logging.py:68:log_dist] [Rank 0] step=740, skipped=5, lr=[9.48e-06], mom=[[0.9, 0.999]] [2022-12-16 22:01:02,245] [INFO] [timer.py:197:stop] 0/1480, RunningAvgSamplesPerSec=6.327453438484531, CurrSamplesPerSec=5.707022801349469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:13,631] [INFO] [timer.py:197:stop] 0/1482, RunningAvgSamplesPerSec=6.327449584064208, CurrSamplesPerSec=5.686699097472274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:25,011] [INFO] [timer.py:197:stop] 0/1484, RunningAvgSamplesPerSec=6.327461812433459, CurrSamplesPerSec=5.6979624620946625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:36,382] [INFO] [timer.py:197:stop] 0/1486, RunningAvgSamplesPerSec=6.327474119248579, CurrSamplesPerSec=5.7187698940143425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:47,707] [INFO] [timer.py:197:stop] 0/1488, RunningAvgSamplesPerSec=6.327500660405285, CurrSamplesPerSec=5.706598409685004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:59,104] [INFO] [timer.py:197:stop] 0/1490, RunningAvgSamplesPerSec=6.327497573482637, CurrSamplesPerSec=5.683719497130948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:10,486] [INFO] [timer.py:197:stop] 0/1492, RunningAvgSamplesPerSec=6.32748801533402, CurrSamplesPerSec=5.683161635883977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:21,823] [INFO] [timer.py:197:stop] 0/1494, RunningAvgSamplesPerSec=6.32750880749429, CurrSamplesPerSec=5.711333623231045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:33,231] [INFO] [timer.py:197:stop] 0/1496, RunningAvgSamplesPerSec=6.327513699728768, CurrSamplesPerSec=5.706028528414905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:44,547] [INFO] [timer.py:197:stop] 0/1498, RunningAvgSamplesPerSec=6.327550204303378, CurrSamplesPerSec=5.724850952803729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:56,026] [INFO] [logging.py:68:log_dist] [Rank 0] step=750, skipped=5, lr=[9.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 22:02:56,027] [INFO] [timer.py:197:stop] 0/1500, RunningAvgSamplesPerSec=6.327458217506727, CurrSamplesPerSec=5.5676383899106145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0173, 'learning_rate': 9.457777777777778e-06, 'epoch': 3.18} [2022-12-16 22:03:07,403] [INFO] [timer.py:197:stop] 0/1502, RunningAvgSamplesPerSec=6.327455446968343, CurrSamplesPerSec=5.696816345954642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:18,794] [INFO] [timer.py:197:stop] 0/1504, RunningAvgSamplesPerSec=6.327460698300773, CurrSamplesPerSec=5.6907456868615665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:30,167] [INFO] [timer.py:197:stop] 0/1506, RunningAvgSamplesPerSec=6.327476845175162, CurrSamplesPerSec=5.700586309832747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:41,557] [INFO] [timer.py:197:stop] 0/1508, RunningAvgSamplesPerSec=6.327479752759413, CurrSamplesPerSec=5.698833422003168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:53,085] [INFO] [timer.py:197:stop] 0/1510, RunningAvgSamplesPerSec=6.327483658107985, CurrSamplesPerSec=5.690109492159518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:04,525] [INFO] [timer.py:197:stop] 0/1512, RunningAvgSamplesPerSec=6.327482981825684, CurrSamplesPerSec=5.691443324337489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:15,856] [INFO] [timer.py:197:stop] 0/1514, RunningAvgSamplesPerSec=6.327504069039711, CurrSamplesPerSec=5.702331554684467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:27,180] [INFO] [timer.py:197:stop] 0/1516, RunningAvgSamplesPerSec=6.32752931475808, CurrSamplesPerSec=5.699813812273017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:38,838] [INFO] [timer.py:197:stop] 0/1518, RunningAvgSamplesPerSec=6.327495958559439, CurrSamplesPerSec=5.70111127252215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:50,193] [INFO] [logging.py:68:log_dist] [Rank 0] step=760, skipped=5, lr=[9.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 22:04:50,195] [INFO] [timer.py:197:stop] 0/1520, RunningAvgSamplesPerSec=6.327491850184652, CurrSamplesPerSec=5.6779257782447745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:01,599] [INFO] [timer.py:197:stop] 0/1522, RunningAvgSamplesPerSec=6.3275112010285595, CurrSamplesPerSec=5.704700217859312, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:13,081] [INFO] [timer.py:197:stop] 0/1524, RunningAvgSamplesPerSec=6.327502103613446, CurrSamplesPerSec=5.698904562156051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:24,417] [INFO] [timer.py:197:stop] 0/1526, RunningAvgSamplesPerSec=6.327493535489943, CurrSamplesPerSec=5.677983426370301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:35,707] [INFO] [timer.py:197:stop] 0/1528, RunningAvgSamplesPerSec=6.327507103762715, CurrSamplesPerSec=5.697476050095679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:47,074] [INFO] [timer.py:197:stop] 0/1530, RunningAvgSamplesPerSec=6.327507935110874, CurrSamplesPerSec=5.715472068638608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:58,447] [INFO] [timer.py:197:stop] 0/1532, RunningAvgSamplesPerSec=6.32751138193243, CurrSamplesPerSec=5.6919378793894335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:09,787] [INFO] [timer.py:197:stop] 0/1534, RunningAvgSamplesPerSec=6.327521338018502, CurrSamplesPerSec=5.7004705792212755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:21,190] [INFO] [timer.py:197:stop] 0/1536, RunningAvgSamplesPerSec=6.3274682129513, CurrSamplesPerSec=5.712174397994211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:32,541] [INFO] [timer.py:197:stop] 0/1538, RunningAvgSamplesPerSec=6.3274698474125985, CurrSamplesPerSec=5.68027300306161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:43,888] [INFO] [logging.py:68:log_dist] [Rank 0] step=770, skipped=5, lr=[9.413333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 22:06:43,890] [INFO] [timer.py:197:stop] 0/1540, RunningAvgSamplesPerSec=6.327473245108575, CurrSamplesPerSec=5.6981792094536186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:55,564] [INFO] [timer.py:197:stop] 0/1542, RunningAvgSamplesPerSec=6.327513329572601, CurrSamplesPerSec=5.718835197312315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:07,225] [INFO] [timer.py:197:stop] 0/1544, RunningAvgSamplesPerSec=6.327515501200996, CurrSamplesPerSec=5.701398008835235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:18,958] [INFO] [timer.py:197:stop] 0/1546, RunningAvgSamplesPerSec=6.3272162957930815, CurrSamplesPerSec=5.309439632999244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:30,285] [INFO] [timer.py:197:stop] 0/1548, RunningAvgSamplesPerSec=6.327237769724863, CurrSamplesPerSec=5.715100200108538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:41,816] [INFO] [timer.py:197:stop] 0/1550, RunningAvgSamplesPerSec=6.327274161216004, CurrSamplesPerSec=5.71891780324732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0172, 'learning_rate': 9.402222222222222e-06, 'epoch': 3.28} [2022-12-16 22:07:53,410] [INFO] [timer.py:197:stop] 0/1552, RunningAvgSamplesPerSec=6.32706992914816, CurrSamplesPerSec=5.411217698198677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:04,941] [INFO] [timer.py:197:stop] 0/1554, RunningAvgSamplesPerSec=6.327092359809133, CurrSamplesPerSec=5.700676379585338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:16,285] [INFO] [timer.py:197:stop] 0/1556, RunningAvgSamplesPerSec=6.327101079548158, CurrSamplesPerSec=5.68027853219152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:27,892] [INFO] [timer.py:197:stop] 0/1558, RunningAvgSamplesPerSec=6.326898704258696, CurrSamplesPerSec=5.425364903600726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:39,298] [INFO] [logging.py:68:log_dist] [Rank 0] step=780, skipped=5, lr=[9.391111111111111e-06], mom=[[0.9, 0.999]] [2022-12-16 22:08:39,300] [INFO] [timer.py:197:stop] 0/1560, RunningAvgSamplesPerSec=6.326863672241038, CurrSamplesPerSec=5.64888880789383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:50,744] [INFO] [timer.py:197:stop] 0/1562, RunningAvgSamplesPerSec=6.326881633242561, CurrSamplesPerSec=5.688473217102632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:02,169] [INFO] [timer.py:197:stop] 0/1564, RunningAvgSamplesPerSec=6.326828353757074, CurrSamplesPerSec=5.6164079889420755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:13,635] [INFO] [timer.py:197:stop] 0/1566, RunningAvgSamplesPerSec=6.32684329058874, CurrSamplesPerSec=5.711484793733482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:25,042] [INFO] [timer.py:197:stop] 0/1568, RunningAvgSamplesPerSec=6.326860661533723, CurrSamplesPerSec=5.69864977247261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:36,536] [INFO] [timer.py:197:stop] 0/1570, RunningAvgSamplesPerSec=6.326751882842448, CurrSamplesPerSec=5.55672099720942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:48,053] [INFO] [timer.py:197:stop] 0/1572, RunningAvgSamplesPerSec=6.326774718534809, CurrSamplesPerSec=5.70892423918322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:59,564] [INFO] [timer.py:197:stop] 0/1574, RunningAvgSamplesPerSec=6.32673483292785, CurrSamplesPerSec=5.659623979889483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:10,906] [INFO] [timer.py:197:stop] 0/1576, RunningAvgSamplesPerSec=6.326735484116569, CurrSamplesPerSec=5.687612892361142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:22,483] [INFO] [timer.py:197:stop] 0/1578, RunningAvgSamplesPerSec=6.326743472057318, CurrSamplesPerSec=5.687099328553298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:34,085] [INFO] [logging.py:68:log_dist] [Rank 0] step=790, skipped=5, lr=[9.368888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 22:10:34,086] [INFO] [timer.py:197:stop] 0/1580, RunningAvgSamplesPerSec=6.326741989967983, CurrSamplesPerSec=5.6911527620388345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:45,432] [INFO] [timer.py:197:stop] 0/1582, RunningAvgSamplesPerSec=6.326739758130946, CurrSamplesPerSec=5.690173660070088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:56,913] [INFO] [timer.py:197:stop] 0/1584, RunningAvgSamplesPerSec=6.326729646225332, CurrSamplesPerSec=5.6936201062598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:08,620] [INFO] [timer.py:197:stop] 0/1586, RunningAvgSamplesPerSec=6.326719942273098, CurrSamplesPerSec=5.678790863330024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:19,982] [INFO] [timer.py:197:stop] 0/1588, RunningAvgSamplesPerSec=6.326714577519651, CurrSamplesPerSec=5.6897175203704835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:31,391] [INFO] [timer.py:197:stop] 0/1590, RunningAvgSamplesPerSec=6.326709002994419, CurrSamplesPerSec=5.695294385974962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:42,904] [INFO] [timer.py:197:stop] 0/1592, RunningAvgSamplesPerSec=6.326720212022566, CurrSamplesPerSec=5.700918516686109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:54,288] [INFO] [timer.py:197:stop] 0/1594, RunningAvgSamplesPerSec=6.326701126454511, CurrSamplesPerSec=5.660456759611574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:05,611] [INFO] [timer.py:197:stop] 0/1596, RunningAvgSamplesPerSec=6.326729466778808, CurrSamplesPerSec=5.713571618248627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:16,916] [INFO] [timer.py:197:stop] 0/1598, RunningAvgSamplesPerSec=6.3267459876247045, CurrSamplesPerSec=5.7029544923126725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:28,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=800, skipped=5, lr=[9.346666666666666e-06], mom=[[0.9, 0.999]] [2022-12-16 22:12:28,448] [INFO] [timer.py:197:stop] 0/1600, RunningAvgSamplesPerSec=6.326609933797941, CurrSamplesPerSec=5.5002245491774175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0166, 'learning_rate': 9.346666666666666e-06, 'epoch': 3.39} [2022-12-16 22:12:39,770] [INFO] [timer.py:197:stop] 0/1602, RunningAvgSamplesPerSec=6.326638422676115, CurrSamplesPerSec=5.707492641836485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:51,065] [INFO] [timer.py:197:stop] 0/1604, RunningAvgSamplesPerSec=6.3266746590610845, CurrSamplesPerSec=5.729000214488998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:02,571] [INFO] [timer.py:197:stop] 0/1606, RunningAvgSamplesPerSec=6.326559939629545, CurrSamplesPerSec=5.546773415140327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:13,872] [INFO] [timer.py:197:stop] 0/1608, RunningAvgSamplesPerSec=6.326604435719887, CurrSamplesPerSec=5.730451430762789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:25,198] [INFO] [timer.py:197:stop] 0/1610, RunningAvgSamplesPerSec=6.326630818586847, CurrSamplesPerSec=5.708273048903529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:36,873] [INFO] [timer.py:197:stop] 0/1612, RunningAvgSamplesPerSec=6.32665586766633, CurrSamplesPerSec=5.703729295796363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:48,186] [INFO] [timer.py:197:stop] 0/1614, RunningAvgSamplesPerSec=6.3266910772859575, CurrSamplesPerSec=5.71860006368878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:59,799] [INFO] [timer.py:197:stop] 0/1616, RunningAvgSamplesPerSec=6.326687842512467, CurrSamplesPerSec=5.6908894957427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:11,397] [INFO] [timer.py:197:stop] 0/1618, RunningAvgSamplesPerSec=6.326692582176606, CurrSamplesPerSec=5.701328259509893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:22,723] [INFO] [logging.py:68:log_dist] [Rank 0] step=810, skipped=5, lr=[9.324444444444444e-06], mom=[[0.9, 0.999]] [2022-12-16 22:14:22,725] [INFO] [timer.py:197:stop] 0/1620, RunningAvgSamplesPerSec=6.326714089116822, CurrSamplesPerSec=5.699270695864197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:34,324] [INFO] [timer.py:197:stop] 0/1622, RunningAvgSamplesPerSec=6.326731860344668, CurrSamplesPerSec=5.691813568788928, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:46,047] [INFO] [timer.py:197:stop] 0/1624, RunningAvgSamplesPerSec=6.326712534725961, CurrSamplesPerSec=5.690206468282752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:57,368] [INFO] [timer.py:197:stop] 0/1626, RunningAvgSamplesPerSec=6.326740345466834, CurrSamplesPerSec=5.712110948419259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:08,686] [INFO] [timer.py:197:stop] 0/1628, RunningAvgSamplesPerSec=6.326758798477797, CurrSamplesPerSec=5.6973323918270635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:20,061] [INFO] [timer.py:197:stop] 0/1630, RunningAvgSamplesPerSec=6.326747846596995, CurrSamplesPerSec=5.691386609266169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:31,412] [INFO] [timer.py:197:stop] 0/1632, RunningAvgSamplesPerSec=6.3267535500364955, CurrSamplesPerSec=5.690381612642104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:42,701] [INFO] [timer.py:197:stop] 0/1634, RunningAvgSamplesPerSec=6.326795979692171, CurrSamplesPerSec=5.725662737910497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:54,393] [INFO] [timer.py:197:stop] 0/1636, RunningAvgSamplesPerSec=6.326834150424974, CurrSamplesPerSec=5.719259705105001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:05,823] [INFO] [timer.py:197:stop] 0/1638, RunningAvgSamplesPerSec=6.326843525216495, CurrSamplesPerSec=5.697779594336017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:17,363] [INFO] [logging.py:68:log_dist] [Rank 0] step=820, skipped=5, lr=[9.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 22:16:17,364] [INFO] [timer.py:197:stop] 0/1640, RunningAvgSamplesPerSec=6.326679026146229, CurrSamplesPerSec=5.458330096859955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:28,822] [INFO] [timer.py:197:stop] 0/1642, RunningAvgSamplesPerSec=6.326724094955409, CurrSamplesPerSec=5.730091554512378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:40,240] [INFO] [timer.py:197:stop] 0/1644, RunningAvgSamplesPerSec=6.326761135677585, CurrSamplesPerSec=5.722593622930806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:51,697] [INFO] [timer.py:197:stop] 0/1646, RunningAvgSamplesPerSec=6.326697672667059, CurrSamplesPerSec=5.612376038068795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:03,218] [INFO] [timer.py:197:stop] 0/1648, RunningAvgSamplesPerSec=6.326712992902165, CurrSamplesPerSec=5.70818492382536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:14,675] [INFO] [timer.py:197:stop] 0/1650, RunningAvgSamplesPerSec=6.32673517454783, CurrSamplesPerSec=5.714327900088595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0196, 'learning_rate': 9.291111111111112e-06, 'epoch': 3.5} [2022-12-16 22:17:26,082] [INFO] [timer.py:197:stop] 0/1652, RunningAvgSamplesPerSec=6.326699396447215, CurrSamplesPerSec=5.621307520169431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:37,569] [INFO] [timer.py:197:stop] 0/1654, RunningAvgSamplesPerSec=6.3267304307190875, CurrSamplesPerSec=5.707525164650132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:49,057] [INFO] [timer.py:197:stop] 0/1656, RunningAvgSamplesPerSec=6.326770394708087, CurrSamplesPerSec=5.701385415080797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:00,478] [INFO] [timer.py:197:stop] 0/1658, RunningAvgSamplesPerSec=6.326760667852926, CurrSamplesPerSec=5.662375782745366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:11,888] [INFO] [logging.py:68:log_dist] [Rank 0] step=830, skipped=5, lr=[9.280000000000001e-06], mom=[[0.9, 0.999]] [2022-12-16 22:18:11,890] [INFO] [timer.py:197:stop] 0/1660, RunningAvgSamplesPerSec=6.326821622093543, CurrSamplesPerSec=5.7435494700991665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:23,408] [INFO] [timer.py:197:stop] 0/1662, RunningAvgSamplesPerSec=6.326843356419004, CurrSamplesPerSec=5.699876262773075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:34,855] [INFO] [timer.py:197:stop] 0/1664, RunningAvgSamplesPerSec=6.326798218964038, CurrSamplesPerSec=5.607935810568727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:46,396] [INFO] [timer.py:197:stop] 0/1666, RunningAvgSamplesPerSec=6.326834587271337, CurrSamplesPerSec=5.716924474281732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:57,827] [INFO] [timer.py:197:stop] 0/1668, RunningAvgSamplesPerSec=6.326838624746808, CurrSamplesPerSec=5.694305403046297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:09,201] [INFO] [timer.py:197:stop] 0/1670, RunningAvgSamplesPerSec=6.326862994715842, CurrSamplesPerSec=5.696425141989684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:20,826] [INFO] [timer.py:197:stop] 0/1672, RunningAvgSamplesPerSec=6.326887429548338, CurrSamplesPerSec=5.722593622930806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:32,408] [INFO] [timer.py:197:stop] 0/1674, RunningAvgSamplesPerSec=6.326924458197643, CurrSamplesPerSec=5.73617883162889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:43,753] [INFO] [timer.py:197:stop] 0/1676, RunningAvgSamplesPerSec=6.326932090614424, CurrSamplesPerSec=5.685106201292763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:55,356] [INFO] [timer.py:197:stop] 0/1678, RunningAvgSamplesPerSec=6.326963550525718, CurrSamplesPerSec=5.708342482699776, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:06,777] [INFO] [logging.py:68:log_dist] [Rank 0] step=840, skipped=5, lr=[9.257777777777779e-06], mom=[[0.9, 0.999]] [2022-12-16 22:20:06,779] [INFO] [timer.py:197:stop] 0/1680, RunningAvgSamplesPerSec=6.326964525629489, CurrSamplesPerSec=5.694321106174752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:18,115] [INFO] [timer.py:197:stop] 0/1682, RunningAvgSamplesPerSec=6.326980760880848, CurrSamplesPerSec=5.6901266195429265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:29,491] [INFO] [timer.py:197:stop] 0/1684, RunningAvgSamplesPerSec=6.327030240339728, CurrSamplesPerSec=5.727282599924787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:40,822] [INFO] [timer.py:197:stop] 0/1686, RunningAvgSamplesPerSec=6.327050807508987, CurrSamplesPerSec=5.70611585915527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:52,189] [INFO] [timer.py:197:stop] 0/1688, RunningAvgSamplesPerSec=6.327042771339603, CurrSamplesPerSec=5.677036223927891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:03,656] [INFO] [timer.py:197:stop] 0/1690, RunningAvgSamplesPerSec=6.327078544945673, CurrSamplesPerSec=5.716653948578718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:14,994] [INFO] [timer.py:197:stop] 0/1692, RunningAvgSamplesPerSec=6.327080143901506, CurrSamplesPerSec=5.695177661829884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:26,497] [INFO] [timer.py:197:stop] 0/1694, RunningAvgSamplesPerSec=6.3269586642354865, CurrSamplesPerSec=5.512899898920243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:37,801] [INFO] [timer.py:197:stop] 0/1696, RunningAvgSamplesPerSec=6.326997931656717, CurrSamplesPerSec=5.730388308477891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:49,125] [INFO] [timer.py:197:stop] 0/1698, RunningAvgSamplesPerSec=6.327020896294555, CurrSamplesPerSec=5.703355560734074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:00,877] [INFO] [logging.py:68:log_dist] [Rank 0] step=850, skipped=5, lr=[9.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 22:22:00,879] [INFO] [timer.py:197:stop] 0/1700, RunningAvgSamplesPerSec=6.327048230282517, CurrSamplesPerSec=5.725344248442001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0166, 'learning_rate': 9.235555555555556e-06, 'epoch': 3.6} [2022-12-16 22:22:12,166] [INFO] [timer.py:197:stop] 0/1702, RunningAvgSamplesPerSec=6.327088307183525, CurrSamplesPerSec=5.719386679782303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:23,543] [INFO] [timer.py:197:stop] 0/1704, RunningAvgSamplesPerSec=6.327120891773229, CurrSamplesPerSec=5.709233133050226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:35,131] [INFO] [timer.py:197:stop] 0/1706, RunningAvgSamplesPerSec=6.327090682954414, CurrSamplesPerSec=5.721107852338529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:46,419] [INFO] [timer.py:197:stop] 0/1708, RunningAvgSamplesPerSec=6.327128147702684, CurrSamplesPerSec=5.719666238913165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:57,727] [INFO] [timer.py:197:stop] 0/1710, RunningAvgSamplesPerSec=6.327163186849155, CurrSamplesPerSec=5.71158128455646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:09,157] [INFO] [timer.py:197:stop] 0/1712, RunningAvgSamplesPerSec=6.3271082387375905, CurrSamplesPerSec=5.706069039842615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:20,498] [INFO] [timer.py:197:stop] 0/1714, RunningAvgSamplesPerSec=6.327118251822114, CurrSamplesPerSec=5.709560033560292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:31,806] [INFO] [timer.py:197:stop] 0/1716, RunningAvgSamplesPerSec=6.32716470276775, CurrSamplesPerSec=5.730877909610886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:43,096] [INFO] [timer.py:197:stop] 0/1718, RunningAvgSamplesPerSec=6.327212694616481, CurrSamplesPerSec=5.729023934851759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:54,396] [INFO] [logging.py:68:log_dist] [Rank 0] step=860, skipped=5, lr=[9.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 22:23:54,398] [INFO] [timer.py:197:stop] 0/1720, RunningAvgSamplesPerSec=6.327252695395273, CurrSamplesPerSec=5.7282911397132175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:05,684] [INFO] [timer.py:197:stop] 0/1722, RunningAvgSamplesPerSec=6.327292967660206, CurrSamplesPerSec=5.7332882305885375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:16,984] [INFO] [timer.py:197:stop] 0/1724, RunningAvgSamplesPerSec=6.327334122464313, CurrSamplesPerSec=5.705825252211353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:28,430] [INFO] [timer.py:197:stop] 0/1726, RunningAvgSamplesPerSec=6.327362763895628, CurrSamplesPerSec=5.7203936594189075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:39,753] [INFO] [timer.py:197:stop] 0/1728, RunningAvgSamplesPerSec=6.327388228071011, CurrSamplesPerSec=5.7061243497864975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:51,089] [INFO] [timer.py:197:stop] 0/1730, RunningAvgSamplesPerSec=6.327403411302091, CurrSamplesPerSec=5.6941735001864995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:02,398] [INFO] [timer.py:197:stop] 0/1732, RunningAvgSamplesPerSec=6.327439575613748, CurrSamplesPerSec=5.738262154434546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:13,746] [INFO] [timer.py:197:stop] 0/1734, RunningAvgSamplesPerSec=6.327447907599327, CurrSamplesPerSec=5.701845849844518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:25,056] [INFO] [timer.py:197:stop] 0/1736, RunningAvgSamplesPerSec=6.3274811248164715, CurrSamplesPerSec=5.718736755597752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:36,352] [INFO] [timer.py:197:stop] 0/1738, RunningAvgSamplesPerSec=6.327525118665114, CurrSamplesPerSec=5.739397030686239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:47,809] [INFO] [logging.py:68:log_dist] [Rank 0] step=870, skipped=5, lr=[9.191111111111111e-06], mom=[[0.9, 0.999]] [2022-12-16 22:25:47,810] [INFO] [timer.py:197:stop] 0/1740, RunningAvgSamplesPerSec=6.327570720702815, CurrSamplesPerSec=5.728890663197119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:59,188] [INFO] [timer.py:197:stop] 0/1742, RunningAvgSamplesPerSec=6.327554770194363, CurrSamplesPerSec=5.671244133657371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:10,526] [INFO] [timer.py:197:stop] 0/1744, RunningAvgSamplesPerSec=6.327555642826735, CurrSamplesPerSec=5.702799896105481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:21,899] [INFO] [timer.py:197:stop] 0/1746, RunningAvgSamplesPerSec=6.3275406381037325, CurrSamplesPerSec=5.694381745222209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:33,330] [INFO] [timer.py:197:stop] 0/1748, RunningAvgSamplesPerSec=6.3275595001379035, CurrSamplesPerSec=5.700081051921007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:44,667] [INFO] [timer.py:197:stop] 0/1750, RunningAvgSamplesPerSec=6.3275690092916745, CurrSamplesPerSec=5.699173652540018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0188, 'learning_rate': 9.180000000000002e-06, 'epoch': 3.71} [2022-12-16 22:26:56,176] [INFO] [timer.py:197:stop] 0/1752, RunningAvgSamplesPerSec=6.327599408724447, CurrSamplesPerSec=5.720589685059201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:07,474] [INFO] [timer.py:197:stop] 0/1754, RunningAvgSamplesPerSec=6.327626901835506, CurrSamplesPerSec=5.717025532305319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:19,040] [INFO] [timer.py:197:stop] 0/1756, RunningAvgSamplesPerSec=6.327463841076069, CurrSamplesPerSec=5.456514256106413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:30,379] [INFO] [timer.py:197:stop] 0/1758, RunningAvgSamplesPerSec=6.327479196720069, CurrSamplesPerSec=5.714399670907678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:41,726] [INFO] [logging.py:68:log_dist] [Rank 0] step=880, skipped=5, lr=[9.168888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 22:27:41,728] [INFO] [timer.py:197:stop] 0/1760, RunningAvgSamplesPerSec=6.327481260641632, CurrSamplesPerSec=5.699712393640301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:53,131] [INFO] [timer.py:197:stop] 0/1762, RunningAvgSamplesPerSec=6.3274335041248, CurrSamplesPerSec=5.595196403806098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:04,504] [INFO] [timer.py:197:stop] 0/1764, RunningAvgSamplesPerSec=6.327418345128578, CurrSamplesPerSec=5.674870902112147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:16,055] [INFO] [timer.py:197:stop] 0/1766, RunningAvgSamplesPerSec=6.327419979238297, CurrSamplesPerSec=5.698506538894324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:27,389] [INFO] [timer.py:197:stop] 0/1768, RunningAvgSamplesPerSec=6.327411032748365, CurrSamplesPerSec=5.686275554833261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:38,668] [INFO] [timer.py:197:stop] 0/1770, RunningAvgSamplesPerSec=6.327451486650583, CurrSamplesPerSec=5.7193138087887085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:50,097] [INFO] [timer.py:197:stop] 0/1772, RunningAvgSamplesPerSec=6.327448629806222, CurrSamplesPerSec=5.677691594771782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:01,797] [INFO] [timer.py:197:stop] 0/1774, RunningAvgSamplesPerSec=6.327447834994487, CurrSamplesPerSec=5.693589432396962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:13,137] [INFO] [timer.py:197:stop] 0/1776, RunningAvgSamplesPerSec=6.3274420899994555, CurrSamplesPerSec=5.6716673572904135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:24,741] [INFO] [timer.py:197:stop] 0/1778, RunningAvgSamplesPerSec=6.327465683861914, CurrSamplesPerSec=5.705979769914438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:36,093] [INFO] [logging.py:68:log_dist] [Rank 0] step=890, skipped=5, lr=[9.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 22:29:36,094] [INFO] [timer.py:197:stop] 0/1780, RunningAvgSamplesPerSec=6.327472717879045, CurrSamplesPerSec=5.712534458483042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:47,438] [INFO] [timer.py:197:stop] 0/1782, RunningAvgSamplesPerSec=6.327477872742945, CurrSamplesPerSec=5.687404901098838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:58,859] [INFO] [timer.py:197:stop] 0/1784, RunningAvgSamplesPerSec=6.327497107616166, CurrSamplesPerSec=5.715082435349324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:10,346] [INFO] [timer.py:197:stop] 0/1786, RunningAvgSamplesPerSec=6.327481575537458, CurrSamplesPerSec=5.696479056278363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:21,630] [INFO] [timer.py:197:stop] 0/1788, RunningAvgSamplesPerSec=6.327516818382441, CurrSamplesPerSec=5.706524893701314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:33,134] [INFO] [timer.py:197:stop] 0/1790, RunningAvgSamplesPerSec=6.327544074013899, CurrSamplesPerSec=5.719948994340342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:44,768] [INFO] [timer.py:197:stop] 0/1792, RunningAvgSamplesPerSec=6.327517602571258, CurrSamplesPerSec=5.67557857248875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:56,109] [INFO] [timer.py:197:stop] 0/1794, RunningAvgSamplesPerSec=6.327513822697181, CurrSamplesPerSec=5.687254038624659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:07,643] [INFO] [timer.py:197:stop] 0/1796, RunningAvgSamplesPerSec=6.327525078782281, CurrSamplesPerSec=5.715674085571417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:19,212] [INFO] [timer.py:197:stop] 0/1798, RunningAvgSamplesPerSec=6.327477564547856, CurrSamplesPerSec=5.673090628417221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:30,491] [INFO] [logging.py:68:log_dist] [Rank 0] step=900, skipped=5, lr=[9.124444444444444e-06], mom=[[0.9, 0.999]] [2022-12-16 22:31:30,493] [INFO] [timer.py:197:stop] 0/1800, RunningAvgSamplesPerSec=6.327505898154873, CurrSamplesPerSec=5.719943143944835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0201, 'learning_rate': 9.124444444444444e-06, 'epoch': 3.81} [2022-12-16 22:31:42,020] [INFO] [timer.py:197:stop] 0/1802, RunningAvgSamplesPerSec=6.327532702887273, CurrSamplesPerSec=5.720189114647878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:53,397] [INFO] [timer.py:197:stop] 0/1804, RunningAvgSamplesPerSec=6.327515890612733, CurrSamplesPerSec=5.675125729502997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:04,723] [INFO] [timer.py:197:stop] 0/1806, RunningAvgSamplesPerSec=6.327534028822243, CurrSamplesPerSec=5.714394561738922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:16,328] [INFO] [timer.py:197:stop] 0/1808, RunningAvgSamplesPerSec=6.327547488856713, CurrSamplesPerSec=5.7109933973552405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:27,941] [INFO] [timer.py:197:stop] 0/1810, RunningAvgSamplesPerSec=6.327525969813436, CurrSamplesPerSec=5.6556872897253445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:39,261] [INFO] [timer.py:197:stop] 0/1812, RunningAvgSamplesPerSec=6.327539814732807, CurrSamplesPerSec=5.705926645929653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:50,811] [INFO] [timer.py:197:stop] 0/1814, RunningAvgSamplesPerSec=6.327557862560616, CurrSamplesPerSec=5.702386549923387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:02,582] [INFO] [timer.py:197:stop] 0/1816, RunningAvgSamplesPerSec=6.327521828081135, CurrSamplesPerSec=5.651765591854346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:13,943] [INFO] [timer.py:197:stop] 0/1818, RunningAvgSamplesPerSec=6.327518212922253, CurrSamplesPerSec=5.679664624133516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:25,525] [INFO] [logging.py:68:log_dist] [Rank 0] step=910, skipped=5, lr=[9.102222222222224e-06], mom=[[0.9, 0.999]] [2022-12-16 22:33:25,526] [INFO] [timer.py:197:stop] 0/1820, RunningAvgSamplesPerSec=6.32753376849523, CurrSamplesPerSec=5.699907972659344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:36,981] [INFO] [timer.py:197:stop] 0/1822, RunningAvgSamplesPerSec=6.327490357640737, CurrSamplesPerSec=5.683644643984692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:48,325] [INFO] [timer.py:197:stop] 0/1824, RunningAvgSamplesPerSec=6.3274986839712275, CurrSamplesPerSec=5.688544098830273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:59,676] [INFO] [timer.py:197:stop] 0/1826, RunningAvgSamplesPerSec=6.327526659579, CurrSamplesPerSec=5.709614439656007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:11,315] [INFO] [timer.py:197:stop] 0/1828, RunningAvgSamplesPerSec=6.327508865546779, CurrSamplesPerSec=5.671136780061463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:22,675] [INFO] [timer.py:197:stop] 0/1830, RunningAvgSamplesPerSec=6.327503801542838, CurrSamplesPerSec=5.6768350081309285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:34,008] [INFO] [timer.py:197:stop] 0/1832, RunningAvgSamplesPerSec=6.327506325462348, CurrSamplesPerSec=5.701579655712555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:45,563] [INFO] [timer.py:197:stop] 0/1834, RunningAvgSamplesPerSec=6.327367375204487, CurrSamplesPerSec=5.699337974721977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:56,901] [INFO] [timer.py:197:stop] 0/1836, RunningAvgSamplesPerSec=6.327379195232485, CurrSamplesPerSec=5.704727132017407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:08,597] [INFO] [timer.py:197:stop] 0/1838, RunningAvgSamplesPerSec=6.3271557453667935, CurrSamplesPerSec=5.376965988972667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:20,153] [INFO] [logging.py:68:log_dist] [Rank 0] step=920, skipped=5, lr=[9.080000000000001e-06], mom=[[0.9, 0.999]] [2022-12-16 22:35:20,155] [INFO] [timer.py:197:stop] 0/1840, RunningAvgSamplesPerSec=6.327150200517756, CurrSamplesPerSec=5.699684558608608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:31,485] [INFO] [timer.py:197:stop] 0/1842, RunningAvgSamplesPerSec=6.327162084321393, CurrSamplesPerSec=5.706075347062752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:42,800] [INFO] [timer.py:197:stop] 0/1844, RunningAvgSamplesPerSec=6.327168810082029, CurrSamplesPerSec=5.691259909466644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:54,479] [INFO] [timer.py:197:stop] 0/1846, RunningAvgSamplesPerSec=6.32719038402284, CurrSamplesPerSec=5.714115760953486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:05,815] [INFO] [timer.py:197:stop] 0/1848, RunningAvgSamplesPerSec=6.327205165796917, CurrSamplesPerSec=5.698616624872424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:17,200] [INFO] [timer.py:197:stop] 0/1850, RunningAvgSamplesPerSec=6.327176775421694, CurrSamplesPerSec=5.642583654232158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0204, 'learning_rate': 9.06888888888889e-06, 'epoch': 3.92} [2022-12-16 22:36:28,679] [INFO] [timer.py:197:stop] 0/1852, RunningAvgSamplesPerSec=6.327188291303244, CurrSamplesPerSec=5.703051179780791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:40,131] [INFO] [timer.py:197:stop] 0/1854, RunningAvgSamplesPerSec=6.327226112581387, CurrSamplesPerSec=5.734747256027386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:51,510] [INFO] [timer.py:197:stop] 0/1856, RunningAvgSamplesPerSec=6.327212472477824, CurrSamplesPerSec=5.660549146666839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:02,922] [INFO] [timer.py:197:stop] 0/1858, RunningAvgSamplesPerSec=6.327249935928411, CurrSamplesPerSec=5.742332367647951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:14,245] [INFO] [logging.py:68:log_dist] [Rank 0] step=930, skipped=5, lr=[9.057777777777779e-06], mom=[[0.9, 0.999]] [2022-12-16 22:37:14,247] [INFO] [timer.py:197:stop] 0/1860, RunningAvgSamplesPerSec=6.327266281775782, CurrSamplesPerSec=5.716212055721702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:25,655] [INFO] [timer.py:197:stop] 0/1862, RunningAvgSamplesPerSec=6.3272304696234105, CurrSamplesPerSec=5.653700635485529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:36,996] [INFO] [timer.py:197:stop] 0/1864, RunningAvgSamplesPerSec=6.327240547607681, CurrSamplesPerSec=5.701383235398178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:48,403] [INFO] [timer.py:197:stop] 0/1866, RunningAvgSamplesPerSec=6.327233090336316, CurrSamplesPerSec=5.6837647469343695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:59,744] [INFO] [timer.py:197:stop] 0/1868, RunningAvgSamplesPerSec=6.327240665312632, CurrSamplesPerSec=5.687485878507279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:11,229] [INFO] [timer.py:197:stop] 0/1870, RunningAvgSamplesPerSec=6.327244748394163, CurrSamplesPerSec=5.691447427173725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:22,637] [INFO] [timer.py:197:stop] 0/1872, RunningAvgSamplesPerSec=6.3272248737369186, CurrSamplesPerSec=5.657825353438808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:34,073] [INFO] [timer.py:197:stop] 0/1874, RunningAvgSamplesPerSec=6.327195858796056, CurrSamplesPerSec=5.637339864727594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:45,464] [INFO] [timer.py:197:stop] 0/1876, RunningAvgSamplesPerSec=6.327186040553401, CurrSamplesPerSec=5.683998228114204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:56,954] [INFO] [timer.py:197:stop] 0/1878, RunningAvgSamplesPerSec=6.327190810218982, CurrSamplesPerSec=5.687580837096232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:08,592] [INFO] [logging.py:68:log_dist] [Rank 0] step=940, skipped=5, lr=[9.035555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 22:39:08,594] [INFO] [timer.py:197:stop] 0/1880, RunningAvgSamplesPerSec=6.32700099351177, CurrSamplesPerSec=5.406531701958022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:19,900] [INFO] [timer.py:197:stop] 0/1882, RunningAvgSamplesPerSec=6.327025337615598, CurrSamplesPerSec=5.704069383941154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:31,203] [INFO] [timer.py:197:stop] 0/1884, RunningAvgSamplesPerSec=6.327061342294388, CurrSamplesPerSec=5.719577030502328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:42,655] [INFO] [timer.py:197:stop] 0/1886, RunningAvgSamplesPerSec=6.326975385899959, CurrSamplesPerSec=5.544955760650507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:51,166] [INFO] [timer.py:197:stop] 0/1888, RunningAvgSamplesPerSec=6.3286334775112, CurrSamplesPerSec=10.195979896061841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:02,504] [INFO] [timer.py:197:stop] 0/1890, RunningAvgSamplesPerSec=6.328647509479657, CurrSamplesPerSec=5.710262048965369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:14,194] [INFO] [timer.py:197:stop] 0/1892, RunningAvgSamplesPerSec=6.3284261862826945, CurrSamplesPerSec=5.367132041920802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:25,543] [INFO] [timer.py:197:stop] 0/1894, RunningAvgSamplesPerSec=6.32843106698793, CurrSamplesPerSec=5.6931508564600115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:36,849] [INFO] [timer.py:197:stop] 0/1896, RunningAvgSamplesPerSec=6.328464305906655, CurrSamplesPerSec=5.728721698325411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:48,477] [INFO] [timer.py:197:stop] 0/1898, RunningAvgSamplesPerSec=6.328273681309304, CurrSamplesPerSec=5.381708745107983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:59,811] [INFO] [logging.py:68:log_dist] [Rank 0] step=950, skipped=5, lr=[9.013333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 22:40:59,813] [INFO] [timer.py:197:stop] 0/1900, RunningAvgSamplesPerSec=6.328286660507801, CurrSamplesPerSec=5.713606642642039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0177, 'learning_rate': 9.013333333333334e-06, 'epoch': 4.03} [2022-12-16 22:41:11,142] [INFO] [timer.py:197:stop] 0/1902, RunningAvgSamplesPerSec=6.328296409917468, CurrSamplesPerSec=5.689266276962426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:22,496] [INFO] [timer.py:197:stop] 0/1904, RunningAvgSamplesPerSec=6.328285659790885, CurrSamplesPerSec=5.687977576563542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:33,999] [INFO] [timer.py:197:stop] 0/1906, RunningAvgSamplesPerSec=6.328301415114474, CurrSamplesPerSec=5.708266979596024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:45,322] [INFO] [timer.py:197:stop] 0/1908, RunningAvgSamplesPerSec=6.328300809211748, CurrSamplesPerSec=5.697474840820759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:56,767] [INFO] [timer.py:197:stop] 0/1910, RunningAvgSamplesPerSec=6.328245084111687, CurrSamplesPerSec=5.605507272331195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:08,230] [INFO] [timer.py:197:stop] 0/1912, RunningAvgSamplesPerSec=6.32823748333477, CurrSamplesPerSec=5.698080509950753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:19,577] [INFO] [timer.py:197:stop] 0/1914, RunningAvgSamplesPerSec=6.328233811772137, CurrSamplesPerSec=5.692649088685696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:31,224] [INFO] [timer.py:197:stop] 0/1916, RunningAvgSamplesPerSec=6.328043612491889, CurrSamplesPerSec=5.422718309822167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:42,605] [INFO] [timer.py:197:stop] 0/1918, RunningAvgSamplesPerSec=6.3280308734851065, CurrSamplesPerSec=5.674925368944043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:53,953] [INFO] [logging.py:68:log_dist] [Rank 0] step=960, skipped=5, lr=[8.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 22:42:53,955] [INFO] [timer.py:197:stop] 0/1920, RunningAvgSamplesPerSec=6.328023141891153, CurrSamplesPerSec=5.678259192763114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:05,610] [INFO] [timer.py:197:stop] 0/1922, RunningAvgSamplesPerSec=6.327827961176475, CurrSamplesPerSec=5.4101643964847765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:16,922] [INFO] [timer.py:197:stop] 0/1924, RunningAvgSamplesPerSec=6.32784426629273, CurrSamplesPerSec=5.692889095536372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:28,308] [INFO] [timer.py:197:stop] 0/1926, RunningAvgSamplesPerSec=6.327825843494196, CurrSamplesPerSec=5.682812727446553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:39,676] [INFO] [timer.py:197:stop] 0/1928, RunningAvgSamplesPerSec=6.327806702706537, CurrSamplesPerSec=5.673969112151267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:50,965] [INFO] [timer.py:197:stop] 0/1930, RunningAvgSamplesPerSec=6.327840292960915, CurrSamplesPerSec=5.721387335868599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:02,315] [INFO] [timer.py:197:stop] 0/1932, RunningAvgSamplesPerSec=6.327859995305297, CurrSamplesPerSec=5.708636502563485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:13,930] [INFO] [timer.py:197:stop] 0/1934, RunningAvgSamplesPerSec=6.327838062306126, CurrSamplesPerSec=5.69510153996506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:25,317] [INFO] [timer.py:197:stop] 0/1936, RunningAvgSamplesPerSec=6.327819423140711, CurrSamplesPerSec=5.685719600318748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:36,691] [INFO] [timer.py:197:stop] 0/1938, RunningAvgSamplesPerSec=6.327798215144098, CurrSamplesPerSec=5.670691833157158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:48,357] [INFO] [logging.py:68:log_dist] [Rank 0] step=970, skipped=5, lr=[8.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 22:44:48,359] [INFO] [timer.py:197:stop] 0/1940, RunningAvgSamplesPerSec=6.327595320689516, CurrSamplesPerSec=5.7164657401281005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:59,718] [INFO] [timer.py:197:stop] 0/1942, RunningAvgSamplesPerSec=6.327599374628342, CurrSamplesPerSec=5.7112604712469635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:11,048] [INFO] [timer.py:197:stop] 0/1944, RunningAvgSamplesPerSec=6.327614975553183, CurrSamplesPerSec=5.709293604393959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:22,718] [INFO] [timer.py:197:stop] 0/1946, RunningAvgSamplesPerSec=6.327627028381739, CurrSamplesPerSec=5.716932997109624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:34,352] [INFO] [timer.py:197:stop] 0/1948, RunningAvgSamplesPerSec=6.327614865246932, CurrSamplesPerSec=5.685503318103534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:45,765] [INFO] [timer.py:197:stop] 0/1950, RunningAvgSamplesPerSec=6.327580884082605, CurrSamplesPerSec=5.649820694504452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0101, 'learning_rate': 8.957777777777778e-06, 'epoch': 4.13} [2022-12-16 22:45:57,261] [INFO] [timer.py:197:stop] 0/1952, RunningAvgSamplesPerSec=6.32759681131148, CurrSamplesPerSec=5.711184404892128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:08,750] [INFO] [timer.py:197:stop] 0/1954, RunningAvgSamplesPerSec=6.327597897269227, CurrSamplesPerSec=5.704721797659587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:20,109] [INFO] [timer.py:197:stop] 0/1956, RunningAvgSamplesPerSec=6.327596837568829, CurrSamplesPerSec=5.676017325551704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:31,499] [INFO] [timer.py:197:stop] 0/1958, RunningAvgSamplesPerSec=6.3276281314148894, CurrSamplesPerSec=5.714223774783396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:42,824] [INFO] [logging.py:68:log_dist] [Rank 0] step=980, skipped=5, lr=[8.946666666666669e-06], mom=[[0.9, 0.999]] [2022-12-16 22:46:42,826] [INFO] [timer.py:197:stop] 0/1960, RunningAvgSamplesPerSec=6.327636555796214, CurrSamplesPerSec=5.699254723419047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:54,425] [INFO] [timer.py:197:stop] 0/1962, RunningAvgSamplesPerSec=6.327482014444832, CurrSamplesPerSec=5.433102247841417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:05,753] [INFO] [timer.py:197:stop] 0/1964, RunningAvgSamplesPerSec=6.327500616331316, CurrSamplesPerSec=5.705595310433084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:17,115] [INFO] [timer.py:197:stop] 0/1966, RunningAvgSamplesPerSec=6.327492170652153, CurrSamplesPerSec=5.685668538891042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:28,632] [INFO] [timer.py:197:stop] 0/1968, RunningAvgSamplesPerSec=6.327392003434301, CurrSamplesPerSec=5.509042049946703, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:40,071] [INFO] [timer.py:197:stop] 0/1970, RunningAvgSamplesPerSec=6.327330482222493, CurrSamplesPerSec=5.6039946277232415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:51,426] [INFO] [timer.py:197:stop] 0/1972, RunningAvgSamplesPerSec=6.3273298008094425, CurrSamplesPerSec=5.7071977690610645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:02,785] [INFO] [timer.py:197:stop] 0/1974, RunningAvgSamplesPerSec=6.327328324697508, CurrSamplesPerSec=5.662362644142834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:14,144] [INFO] [timer.py:197:stop] 0/1976, RunningAvgSamplesPerSec=6.327325920760691, CurrSamplesPerSec=5.696949096715646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:25,463] [INFO] [timer.py:197:stop] 0/1978, RunningAvgSamplesPerSec=6.3273483838352025, CurrSamplesPerSec=5.7194220192270775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:37,007] [INFO] [logging.py:68:log_dist] [Rank 0] step=990, skipped=5, lr=[8.924444444444446e-06], mom=[[0.9, 0.999]] [2022-12-16 22:48:37,009] [INFO] [timer.py:197:stop] 0/1980, RunningAvgSamplesPerSec=6.327227334854434, CurrSamplesPerSec=5.493117913683248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:48,361] [INFO] [timer.py:197:stop] 0/1982, RunningAvgSamplesPerSec=6.327218692310974, CurrSamplesPerSec=5.67055886580819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:59,691] [INFO] [timer.py:197:stop] 0/1984, RunningAvgSamplesPerSec=6.3272315122953255, CurrSamplesPerSec=5.713750149849061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:11,345] [INFO] [timer.py:197:stop] 0/1986, RunningAvgSamplesPerSec=6.327033256452263, CurrSamplesPerSec=5.3792464182237305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:22,687] [INFO] [timer.py:197:stop] 0/1988, RunningAvgSamplesPerSec=6.32703996328013, CurrSamplesPerSec=5.709569748858481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:34,309] [INFO] [timer.py:197:stop] 0/1990, RunningAvgSamplesPerSec=6.327031926644361, CurrSamplesPerSec=5.697876832079683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:46,057] [INFO] [timer.py:197:stop] 0/1992, RunningAvgSamplesPerSec=6.327036540568657, CurrSamplesPerSec=5.711242487325227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:57,349] [INFO] [timer.py:197:stop] 0/1994, RunningAvgSamplesPerSec=6.32706622307384, CurrSamplesPerSec=5.728870856379531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:50:08,810] [INFO] [timer.py:197:stop] 0/1996, RunningAvgSamplesPerSec=6.327081375070768, CurrSamplesPerSec=5.718303554862198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:50:20,250] [INFO] [timer.py:197:stop] 0/1998, RunningAvgSamplesPerSec=6.327057320293952, CurrSamplesPerSec=5.702582070221039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:50:31,570] [INFO] [logging.py:68:log_dist] [Rank 0] step=1000, skipped=5, lr=[8.902222222222224e-06], mom=[[0.9, 0.999]] [2022-12-16 22:50:31,571] [INFO] [timer.py:197:stop] 0/2000, RunningAvgSamplesPerSec=6.327079086755388, CurrSamplesPerSec=5.695145037924359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0106, 'learning_rate': 8.902222222222224e-06, 'epoch': 4.24} {'eval_loss': 0.1624755859375, 'eval_wer': 9.988766321062228, 'eval_runtime': 2123.3545, 'eval_samples_per_second': 3.633, 'eval_steps_per_second': 0.454, 'epoch': 4.24} [2022-12-16 23:25:59,883] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is begin to save! [2022-12-16 23:25:59,893] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt [2022-12-16 23:25:59,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt... [2022-12-16 23:26:04,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt. [2022-12-16 23:26:04,552] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-16 23:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-16 23:26:26,839] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-16 23:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! [2022-12-16 23:28:19,593] [INFO] [timer.py:197:stop] 0/2002, RunningAvgSamplesPerSec=6.32691884455914, CurrSamplesPerSec=5.4362697760722805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:28:30,905] [INFO] [timer.py:197:stop] 0/2004, RunningAvgSamplesPerSec=6.326938914682652, CurrSamplesPerSec=5.695573286638631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:28:42,212] [INFO] [timer.py:197:stop] 0/2006, RunningAvgSamplesPerSec=6.326952016228341, CurrSamplesPerSec=5.708716628834756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:28:53,528] [INFO] [timer.py:197:stop] 0/2008, RunningAvgSamplesPerSec=6.326964531480429, CurrSamplesPerSec=5.706633106005204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:04,858] [INFO] [timer.py:197:stop] 0/2010, RunningAvgSamplesPerSec=6.326969594684604, CurrSamplesPerSec=5.7054209260205395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:16,139] [INFO] [timer.py:197:stop] 0/2012, RunningAvgSamplesPerSec=6.327003835032759, CurrSamplesPerSec=5.727500606066192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:27,485] [INFO] [timer.py:197:stop] 0/2014, RunningAvgSamplesPerSec=6.327001593227679, CurrSamplesPerSec=5.687480335337347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:38,758] [INFO] [timer.py:197:stop] 0/2016, RunningAvgSamplesPerSec=6.327025476720991, CurrSamplesPerSec=5.725522783802955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:50,077] [INFO] [timer.py:197:stop] 0/2018, RunningAvgSamplesPerSec=6.327022878790163, CurrSamplesPerSec=5.6997549938674155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:01,423] [INFO] [logging.py:68:log_dist] [Rank 0] step=1010, skipped=5, lr=[8.880000000000001e-06], mom=[[0.9, 0.999]] [2022-12-16 23:30:01,424] [INFO] [timer.py:197:stop] 0/2020, RunningAvgSamplesPerSec=6.327014036395244, CurrSamplesPerSec=5.670120237564401, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:12,750] [INFO] [timer.py:197:stop] 0/2022, RunningAvgSamplesPerSec=6.3270172363677, CurrSamplesPerSec=5.679545415403237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:24,037] [INFO] [timer.py:197:stop] 0/2024, RunningAvgSamplesPerSec=6.32704175668712, CurrSamplesPerSec=5.717044770261595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:35,311] [INFO] [timer.py:197:stop] 0/2026, RunningAvgSamplesPerSec=6.327067537750394, CurrSamplesPerSec=5.727426794953754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:46,607] [INFO] [timer.py:197:stop] 0/2028, RunningAvgSamplesPerSec=6.327089040336206, CurrSamplesPerSec=5.693161240461662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:58,050] [INFO] [timer.py:197:stop] 0/2030, RunningAvgSamplesPerSec=6.3271135447751226, CurrSamplesPerSec=5.718585688270621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:09,355] [INFO] [timer.py:197:stop] 0/2032, RunningAvgSamplesPerSec=6.327130920305157, CurrSamplesPerSec=5.722399655712234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:20,707] [INFO] [timer.py:197:stop] 0/2034, RunningAvgSamplesPerSec=6.327119374960312, CurrSamplesPerSec=5.675644093117829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:32,059] [INFO] [timer.py:197:stop] 0/2036, RunningAvgSamplesPerSec=6.3270974490717675, CurrSamplesPerSec=5.650667006391652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:43,453] [INFO] [timer.py:197:stop] 0/2038, RunningAvgSamplesPerSec=6.327060451393332, CurrSamplesPerSec=5.645447844234822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:54,806] [INFO] [logging.py:68:log_dist] [Rank 0] step=1020, skipped=5, lr=[8.857777777777779e-06], mom=[[0.9, 0.999]] [2022-12-16 23:31:54,808] [INFO] [timer.py:197:stop] 0/2040, RunningAvgSamplesPerSec=6.327042536606945, CurrSamplesPerSec=5.676920727197855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:06,237] [INFO] [timer.py:197:stop] 0/2042, RunningAvgSamplesPerSec=6.327001069035571, CurrSamplesPerSec=5.630803660428348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:17,545] [INFO] [timer.py:197:stop] 0/2044, RunningAvgSamplesPerSec=6.327008785154406, CurrSamplesPerSec=5.696139630864212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:28,911] [INFO] [timer.py:197:stop] 0/2046, RunningAvgSamplesPerSec=6.326978699203454, CurrSamplesPerSec=5.636375402394337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:40,341] [INFO] [timer.py:197:stop] 0/2048, RunningAvgSamplesPerSec=6.326958900442164, CurrSamplesPerSec=5.643415934825894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:51,716] [INFO] [timer.py:197:stop] 0/2050, RunningAvgSamplesPerSec=6.326956125505802, CurrSamplesPerSec=5.689418934411699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0098, 'learning_rate': 8.846666666666668e-06, 'epoch': 4.34} [2022-12-16 23:33:03,018] [INFO] [timer.py:197:stop] 0/2052, RunningAvgSamplesPerSec=6.3269793020330125, CurrSamplesPerSec=5.7128155365674695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:14,345] [INFO] [timer.py:197:stop] 0/2054, RunningAvgSamplesPerSec=6.326988840063246, CurrSamplesPerSec=5.705008897278177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:25,698] [INFO] [timer.py:197:stop] 0/2056, RunningAvgSamplesPerSec=6.3269836079402495, CurrSamplesPerSec=5.6747514146635405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:37,014] [INFO] [timer.py:197:stop] 0/2058, RunningAvgSamplesPerSec=6.326998426808016, CurrSamplesPerSec=5.708111852476962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:48,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=1030, skipped=5, lr=[8.835555555555557e-06], mom=[[0.9, 0.999]] [2022-12-16 23:33:48,357] [INFO] [timer.py:197:stop] 0/2060, RunningAvgSamplesPerSec=6.326992420904756, CurrSamplesPerSec=5.670153533559918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:59,735] [INFO] [timer.py:197:stop] 0/2062, RunningAvgSamplesPerSec=6.326985330959451, CurrSamplesPerSec=5.671009542558283, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:11,227] [INFO] [timer.py:197:stop] 0/2064, RunningAvgSamplesPerSec=6.326965287952393, CurrSamplesPerSec=5.665643214621001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:22,581] [INFO] [timer.py:197:stop] 0/2066, RunningAvgSamplesPerSec=6.326978858724929, CurrSamplesPerSec=5.681988992282914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:33,984] [INFO] [timer.py:197:stop] 0/2068, RunningAvgSamplesPerSec=6.3269599699551415, CurrSamplesPerSec=5.669742032086501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:45,300] [INFO] [timer.py:197:stop] 0/2070, RunningAvgSamplesPerSec=6.326969935394964, CurrSamplesPerSec=5.712483643684802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:56,646] [INFO] [timer.py:197:stop] 0/2072, RunningAvgSamplesPerSec=6.326979694877666, CurrSamplesPerSec=5.7023923644599925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:08,016] [INFO] [timer.py:197:stop] 0/2074, RunningAvgSamplesPerSec=6.326982548684312, CurrSamplesPerSec=5.703492736656249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:19,363] [INFO] [timer.py:197:stop] 0/2076, RunningAvgSamplesPerSec=6.326988173584383, CurrSamplesPerSec=5.683898093630149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:30,742] [INFO] [timer.py:197:stop] 0/2078, RunningAvgSamplesPerSec=6.3269830341859095, CurrSamplesPerSec=5.673695441170683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:42,209] [INFO] [logging.py:68:log_dist] [Rank 0] step=1040, skipped=5, lr=[8.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 23:35:42,211] [INFO] [timer.py:197:stop] 0/2080, RunningAvgSamplesPerSec=6.326975967919922, CurrSamplesPerSec=5.667466200267374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:53,587] [INFO] [timer.py:197:stop] 0/2082, RunningAvgSamplesPerSec=6.32695109038253, CurrSamplesPerSec=5.666170611529979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:04,935] [INFO] [timer.py:197:stop] 0/2084, RunningAvgSamplesPerSec=6.326944440471758, CurrSamplesPerSec=5.69634632742204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:16,251] [INFO] [timer.py:197:stop] 0/2086, RunningAvgSamplesPerSec=6.326958149657141, CurrSamplesPerSec=5.7132231002860445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:27,626] [INFO] [timer.py:197:stop] 0/2088, RunningAvgSamplesPerSec=6.326936819769224, CurrSamplesPerSec=5.6444559187739705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:39,014] [INFO] [timer.py:197:stop] 0/2090, RunningAvgSamplesPerSec=6.326958427850613, CurrSamplesPerSec=5.718727009077727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:50,379] [INFO] [timer.py:197:stop] 0/2092, RunningAvgSamplesPerSec=6.326959830114065, CurrSamplesPerSec=5.6926826497948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:01,709] [INFO] [timer.py:197:stop] 0/2094, RunningAvgSamplesPerSec=6.326965638724746, CurrSamplesPerSec=5.685005305374708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:13,012] [INFO] [timer.py:197:stop] 0/2096, RunningAvgSamplesPerSec=6.3269861274948385, CurrSamplesPerSec=5.728556410931646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:24,339] [INFO] [timer.py:197:stop] 0/2098, RunningAvgSamplesPerSec=6.326991368641418, CurrSamplesPerSec=5.6820460013056815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:35,844] [INFO] [logging.py:68:log_dist] [Rank 0] step=1050, skipped=5, lr=[8.791111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 23:37:35,846] [INFO] [timer.py:197:stop] 0/2100, RunningAvgSamplesPerSec=6.326890220539331, CurrSamplesPerSec=5.5338151432021565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0109, 'learning_rate': 8.791111111111112e-06, 'epoch': 4.45} [2022-12-16 23:37:47,173] [INFO] [timer.py:197:stop] 0/2102, RunningAvgSamplesPerSec=6.326888480761292, CurrSamplesPerSec=5.678281053440078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:58,497] [INFO] [timer.py:197:stop] 0/2104, RunningAvgSamplesPerSec=6.326901461244185, CurrSamplesPerSec=5.715132566323613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:09,962] [INFO] [timer.py:197:stop] 0/2106, RunningAvgSamplesPerSec=6.326842751936067, CurrSamplesPerSec=5.588156937021963, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:21,235] [INFO] [timer.py:197:stop] 0/2108, RunningAvgSamplesPerSec=6.326883905218231, CurrSamplesPerSec=5.745308582057639, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:32,501] [INFO] [timer.py:197:stop] 0/2110, RunningAvgSamplesPerSec=6.326919605153342, CurrSamplesPerSec=5.71905889677904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:44,049] [INFO] [timer.py:197:stop] 0/2112, RunningAvgSamplesPerSec=6.3267923986561145, CurrSamplesPerSec=5.4640084207893445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:55,381] [INFO] [timer.py:197:stop] 0/2114, RunningAvgSamplesPerSec=6.326795940444241, CurrSamplesPerSec=5.689030433691431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:06,736] [INFO] [timer.py:197:stop] 0/2116, RunningAvgSamplesPerSec=6.326797512321113, CurrSamplesPerSec=5.684546141744876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:18,197] [INFO] [timer.py:197:stop] 0/2118, RunningAvgSamplesPerSec=6.326724028446822, CurrSamplesPerSec=5.569149724324082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:29,514] [INFO] [logging.py:68:log_dist] [Rank 0] step=1060, skipped=5, lr=[8.76888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 23:39:29,516] [INFO] [timer.py:197:stop] 0/2120, RunningAvgSamplesPerSec=6.326731548822072, CurrSamplesPerSec=5.7124179991115005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:40,843] [INFO] [timer.py:197:stop] 0/2122, RunningAvgSamplesPerSec=6.326737425035551, CurrSamplesPerSec=5.714336415176261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:52,446] [INFO] [timer.py:197:stop] 0/2124, RunningAvgSamplesPerSec=6.326762471651918, CurrSamplesPerSec=5.718454120004614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:03,786] [INFO] [timer.py:197:stop] 0/2126, RunningAvgSamplesPerSec=6.326765987597059, CurrSamplesPerSec=5.6890827612364046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:15,174] [INFO] [timer.py:197:stop] 0/2128, RunningAvgSamplesPerSec=6.3267923368660135, CurrSamplesPerSec=5.709115594022861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:26,696] [INFO] [timer.py:197:stop] 0/2130, RunningAvgSamplesPerSec=6.326821169454405, CurrSamplesPerSec=5.706149094055909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:38,041] [INFO] [timer.py:197:stop] 0/2132, RunningAvgSamplesPerSec=6.326823566813653, CurrSamplesPerSec=5.685069358215456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:49,504] [INFO] [timer.py:197:stop] 0/2134, RunningAvgSamplesPerSec=6.326811885483646, CurrSamplesPerSec=5.675575692495808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:00,946] [INFO] [timer.py:197:stop] 0/2136, RunningAvgSamplesPerSec=6.326799375130904, CurrSamplesPerSec=5.682672694867065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:12,263] [INFO] [timer.py:197:stop] 0/2138, RunningAvgSamplesPerSec=6.326803172324738, CurrSamplesPerSec=5.6830444459960425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:23,789] [INFO] [logging.py:68:log_dist] [Rank 0] step=1070, skipped=5, lr=[8.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 23:41:23,791] [INFO] [timer.py:197:stop] 0/2140, RunningAvgSamplesPerSec=6.326814093016492, CurrSamplesPerSec=5.6949693585774135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:35,241] [INFO] [timer.py:197:stop] 0/2142, RunningAvgSamplesPerSec=6.326802233599372, CurrSamplesPerSec=5.693648123620178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:46,557] [INFO] [timer.py:197:stop] 0/2144, RunningAvgSamplesPerSec=6.326814718958849, CurrSamplesPerSec=5.692429864700926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:57,844] [INFO] [timer.py:197:stop] 0/2146, RunningAvgSamplesPerSec=6.326832085580104, CurrSamplesPerSec=5.6958365030451725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:09,151] [INFO] [timer.py:197:stop] 0/2148, RunningAvgSamplesPerSec=6.326850473492747, CurrSamplesPerSec=5.708753293424126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:20,475] [INFO] [timer.py:197:stop] 0/2150, RunningAvgSamplesPerSec=6.326859635258952, CurrSamplesPerSec=5.688270707613868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0098, 'learning_rate': 8.735555555555556e-06, 'epoch': 4.56} [2022-12-16 23:42:31,769] [INFO] [timer.py:197:stop] 0/2152, RunningAvgSamplesPerSec=6.326886225644251, CurrSamplesPerSec=5.717377436917904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:43,104] [INFO] [timer.py:197:stop] 0/2154, RunningAvgSamplesPerSec=6.3268885958737, CurrSamplesPerSec=5.718993344686862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:54,383] [INFO] [timer.py:197:stop] 0/2156, RunningAvgSamplesPerSec=6.326912510207333, CurrSamplesPerSec=5.71347919482766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:05,649] [INFO] [timer.py:197:stop] 0/2158, RunningAvgSamplesPerSec=6.326952236351088, CurrSamplesPerSec=5.745678243164284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:17,030] [INFO] [logging.py:68:log_dist] [Rank 0] step=1080, skipped=5, lr=[8.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 23:43:17,032] [INFO] [timer.py:197:stop] 0/2160, RunningAvgSamplesPerSec=6.326925724205436, CurrSamplesPerSec=5.710000412665337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:28,413] [INFO] [timer.py:197:stop] 0/2162, RunningAvgSamplesPerSec=6.326951473177989, CurrSamplesPerSec=5.713769365768101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:40,014] [INFO] [timer.py:197:stop] 0/2164, RunningAvgSamplesPerSec=6.326797110978181, CurrSamplesPerSec=5.410640719876528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:51,410] [INFO] [timer.py:197:stop] 0/2166, RunningAvgSamplesPerSec=6.326801039464514, CurrSamplesPerSec=5.689852111364271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:02,776] [INFO] [timer.py:197:stop] 0/2168, RunningAvgSamplesPerSec=6.3268023012309405, CurrSamplesPerSec=5.672488341976959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:14,145] [INFO] [timer.py:197:stop] 0/2170, RunningAvgSamplesPerSec=6.3267722977202325, CurrSamplesPerSec=5.65816762228027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:25,666] [INFO] [timer.py:197:stop] 0/2172, RunningAvgSamplesPerSec=6.326781274881491, CurrSamplesPerSec=5.69896820264145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:36,942] [INFO] [timer.py:197:stop] 0/2174, RunningAvgSamplesPerSec=6.326807176968383, CurrSamplesPerSec=5.711531701963128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:48,246] [INFO] [timer.py:197:stop] 0/2176, RunningAvgSamplesPerSec=6.326829037500474, CurrSamplesPerSec=5.716765467323886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:59,728] [INFO] [timer.py:197:stop] 0/2178, RunningAvgSamplesPerSec=6.3268633695582235, CurrSamplesPerSec=5.7243706817524425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:11,026] [INFO] [logging.py:68:log_dist] [Rank 0] step=1090, skipped=5, lr=[8.702222222222222e-06], mom=[[0.9, 0.999]] [2022-12-16 23:45:11,027] [INFO] [timer.py:197:stop] 0/2180, RunningAvgSamplesPerSec=6.326867601605905, CurrSamplesPerSec=5.683175111827742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:22,528] [INFO] [timer.py:197:stop] 0/2182, RunningAvgSamplesPerSec=6.3267694111200745, CurrSamplesPerSec=5.507590052899119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:33,901] [INFO] [timer.py:197:stop] 0/2184, RunningAvgSamplesPerSec=6.3267964922491355, CurrSamplesPerSec=5.718324506818116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:45,296] [INFO] [timer.py:197:stop] 0/2186, RunningAvgSamplesPerSec=6.326810767985056, CurrSamplesPerSec=5.721014453371941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:56,924] [INFO] [timer.py:197:stop] 0/2188, RunningAvgSamplesPerSec=6.326635937130851, CurrSamplesPerSec=5.385385720838578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:08,265] [INFO] [timer.py:197:stop] 0/2190, RunningAvgSamplesPerSec=6.326648563105169, CurrSamplesPerSec=5.7219939529667325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:19,609] [INFO] [timer.py:197:stop] 0/2192, RunningAvgSamplesPerSec=6.326656863235978, CurrSamplesPerSec=5.6884170433826915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:30,957] [INFO] [timer.py:197:stop] 0/2194, RunningAvgSamplesPerSec=6.326641016017453, CurrSamplesPerSec=5.63037116066852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:42,464] [INFO] [timer.py:197:stop] 0/2196, RunningAvgSamplesPerSec=6.326632950912346, CurrSamplesPerSec=5.66079743690004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:53,853] [INFO] [timer.py:197:stop] 0/2198, RunningAvgSamplesPerSec=6.326654620094145, CurrSamplesPerSec=5.713162788862371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:05,232] [INFO] [logging.py:68:log_dist] [Rank 0] step=1100, skipped=5, lr=[8.68e-06], mom=[[0.9, 0.999]] [2022-12-16 23:47:05,237] [INFO] [timer.py:197:stop] 0/2200, RunningAvgSamplesPerSec=6.326613196775679, CurrSamplesPerSec=5.610251308986749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0095, 'learning_rate': 8.68e-06, 'epoch': 4.66} [2022-12-16 23:47:16,813] [INFO] [timer.py:197:stop] 0/2202, RunningAvgSamplesPerSec=6.326616301706529, CurrSamplesPerSec=5.687317901155183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:28,304] [INFO] [timer.py:197:stop] 0/2204, RunningAvgSamplesPerSec=6.326636280550874, CurrSamplesPerSec=5.716672940546703, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:39,604] [INFO] [timer.py:197:stop] 0/2206, RunningAvgSamplesPerSec=6.326631705077604, CurrSamplesPerSec=5.670984383276135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:51,208] [INFO] [timer.py:197:stop] 0/2208, RunningAvgSamplesPerSec=6.326616045712276, CurrSamplesPerSec=5.666151714434629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:02,736] [INFO] [timer.py:197:stop] 0/2210, RunningAvgSamplesPerSec=6.326617519714017, CurrSamplesPerSec=5.679849455627446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:14,047] [INFO] [timer.py:197:stop] 0/2212, RunningAvgSamplesPerSec=6.326616544719444, CurrSamplesPerSec=5.684408671940497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:25,450] [INFO] [timer.py:197:stop] 0/2214, RunningAvgSamplesPerSec=6.326636718841098, CurrSamplesPerSec=5.7177756649699925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:36,769] [INFO] [timer.py:197:stop] 0/2216, RunningAvgSamplesPerSec=6.326639581809611, CurrSamplesPerSec=5.690913384163466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:48,371] [INFO] [timer.py:197:stop] 0/2218, RunningAvgSamplesPerSec=6.326473489332864, CurrSamplesPerSec=5.398720825015092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:59,676] [INFO] [logging.py:68:log_dist] [Rank 0] step=1110, skipped=5, lr=[8.657777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 23:48:59,677] [INFO] [timer.py:197:stop] 0/2220, RunningAvgSamplesPerSec=6.326484511281279, CurrSamplesPerSec=5.696579392761181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:11,062] [INFO] [timer.py:197:stop] 0/2222, RunningAvgSamplesPerSec=6.326506791798435, CurrSamplesPerSec=5.715751001795795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:22,510] [INFO] [timer.py:197:stop] 0/2224, RunningAvgSamplesPerSec=6.32643017700071, CurrSamplesPerSec=5.538575914024925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:34,012] [INFO] [timer.py:197:stop] 0/2226, RunningAvgSamplesPerSec=6.326427190854699, CurrSamplesPerSec=5.682189130264888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:45,549] [INFO] [timer.py:197:stop] 0/2228, RunningAvgSamplesPerSec=6.32642059283797, CurrSamplesPerSec=5.6687824932599185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:57,038] [INFO] [timer.py:197:stop] 0/2230, RunningAvgSamplesPerSec=6.326329693673774, CurrSamplesPerSec=5.519171447397812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:08,352] [INFO] [timer.py:197:stop] 0/2232, RunningAvgSamplesPerSec=6.326326483515551, CurrSamplesPerSec=5.688925299229948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:19,681] [INFO] [timer.py:197:stop] 0/2234, RunningAvgSamplesPerSec=6.326324285237307, CurrSamplesPerSec=5.697434693184849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:31,088] [INFO] [timer.py:197:stop] 0/2236, RunningAvgSamplesPerSec=6.326270862338546, CurrSamplesPerSec=5.5946294316419705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:42,394] [INFO] [timer.py:197:stop] 0/2238, RunningAvgSamplesPerSec=6.326283074554998, CurrSamplesPerSec=5.682271402610858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:53,976] [INFO] [logging.py:68:log_dist] [Rank 0] step=1120, skipped=5, lr=[8.635555555555555e-06], mom=[[0.9, 0.999]] [2022-12-16 23:50:53,978] [INFO] [timer.py:197:stop] 0/2240, RunningAvgSamplesPerSec=6.326285225238069, CurrSamplesPerSec=5.684244487296989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:05,362] [INFO] [timer.py:197:stop] 0/2242, RunningAvgSamplesPerSec=6.326285434003683, CurrSamplesPerSec=5.705309364393889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:16,685] [INFO] [timer.py:197:stop] 0/2244, RunningAvgSamplesPerSec=6.326280138057781, CurrSamplesPerSec=5.686927277705469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:28,045] [INFO] [timer.py:197:stop] 0/2246, RunningAvgSamplesPerSec=6.326284228598904, CurrSamplesPerSec=5.698377344512056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:39,504] [INFO] [timer.py:197:stop] 0/2248, RunningAvgSamplesPerSec=6.32628714792387, CurrSamplesPerSec=5.695185636619195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:50,789] [INFO] [timer.py:197:stop] 0/2250, RunningAvgSamplesPerSec=6.326311280608016, CurrSamplesPerSec=5.715411222963278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0109, 'learning_rate': 8.624444444444446e-06, 'epoch': 4.77} [2022-12-16 23:52:02,112] [INFO] [timer.py:197:stop] 0/2252, RunningAvgSamplesPerSec=6.326315721304854, CurrSamplesPerSec=5.688322056869689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:13,595] [INFO] [timer.py:197:stop] 0/2254, RunningAvgSamplesPerSec=6.326311919093778, CurrSamplesPerSec=5.68811931708187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:24,891] [INFO] [timer.py:197:stop] 0/2256, RunningAvgSamplesPerSec=6.3263219325412035, CurrSamplesPerSec=5.695825384137346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:36,162] [INFO] [timer.py:197:stop] 0/2258, RunningAvgSamplesPerSec=6.3263445079459055, CurrSamplesPerSec=5.699815264594685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:47,467] [INFO] [logging.py:68:log_dist] [Rank 0] step=1130, skipped=5, lr=[8.613333333333333e-06], mom=[[0.9, 0.999]] [2022-12-16 23:52:47,468] [INFO] [timer.py:197:stop] 0/2260, RunningAvgSamplesPerSec=6.326355199772841, CurrSamplesPerSec=5.714679715988128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:58,766] [INFO] [timer.py:197:stop] 0/2262, RunningAvgSamplesPerSec=6.3263710788386165, CurrSamplesPerSec=5.703612710407144, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:10,048] [INFO] [timer.py:197:stop] 0/2264, RunningAvgSamplesPerSec=6.32638719408835, CurrSamplesPerSec=5.7117968816261335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:21,713] [INFO] [timer.py:197:stop] 0/2266, RunningAvgSamplesPerSec=6.326200249143209, CurrSamplesPerSec=5.697177132955527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:33,031] [INFO] [timer.py:197:stop] 0/2268, RunningAvgSamplesPerSec=6.3262049339700885, CurrSamplesPerSec=5.700018839004315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:44,333] [INFO] [timer.py:197:stop] 0/2270, RunningAvgSamplesPerSec=6.326220176723921, CurrSamplesPerSec=5.686674762540661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:55,857] [INFO] [timer.py:197:stop] 0/2272, RunningAvgSamplesPerSec=6.326179349106179, CurrSamplesPerSec=5.696734619054877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:07,129] [INFO] [timer.py:197:stop] 0/2274, RunningAvgSamplesPerSec=6.326192560287468, CurrSamplesPerSec=5.71834692070838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:18,372] [INFO] [timer.py:197:stop] 0/2276, RunningAvgSamplesPerSec=6.326232118597809, CurrSamplesPerSec=5.741100555354587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:29,739] [INFO] [timer.py:197:stop] 0/2278, RunningAvgSamplesPerSec=6.326210908425242, CurrSamplesPerSec=5.7071654926568005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:41,057] [INFO] [logging.py:68:log_dist] [Rank 0] step=1140, skipped=5, lr=[8.591111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 23:54:41,058] [INFO] [timer.py:197:stop] 0/2280, RunningAvgSamplesPerSec=6.3262178608961595, CurrSamplesPerSec=5.7121705083245375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:52,376] [INFO] [timer.py:197:stop] 0/2282, RunningAvgSamplesPerSec=6.3262245491701075, CurrSamplesPerSec=5.720133044142513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:03,710] [INFO] [timer.py:197:stop] 0/2284, RunningAvgSamplesPerSec=6.32621352411815, CurrSamplesPerSec=5.684511472725911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:14,977] [INFO] [timer.py:197:stop] 0/2286, RunningAvgSamplesPerSec=6.326238918068466, CurrSamplesPerSec=5.719674526174559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:26,308] [INFO] [timer.py:197:stop] 0/2288, RunningAvgSamplesPerSec=6.3262380095475725, CurrSamplesPerSec=5.681550999090691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:37,984] [INFO] [timer.py:197:stop] 0/2290, RunningAvgSamplesPerSec=6.326233512446577, CurrSamplesPerSec=5.691795948502412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:49,380] [INFO] [timer.py:197:stop] 0/2292, RunningAvgSamplesPerSec=6.326239816856204, CurrSamplesPerSec=5.708440324097912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:00,697] [INFO] [timer.py:197:stop] 0/2294, RunningAvgSamplesPerSec=6.326241228905768, CurrSamplesPerSec=5.710928273024266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:12,113] [INFO] [timer.py:197:stop] 0/2296, RunningAvgSamplesPerSec=6.326238293926612, CurrSamplesPerSec=5.679791769604584, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:23,643] [INFO] [timer.py:197:stop] 0/2298, RunningAvgSamplesPerSec=6.326238784711817, CurrSamplesPerSec=5.698628480539143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:34,951] [INFO] [logging.py:68:log_dist] [Rank 0] step=1150, skipped=5, lr=[8.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 23:56:34,953] [INFO] [timer.py:197:stop] 0/2300, RunningAvgSamplesPerSec=6.326240334499451, CurrSamplesPerSec=5.6804126769895165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0102, 'learning_rate': 8.56888888888889e-06, 'epoch': 4.87} [2022-12-16 23:56:46,443] [INFO] [timer.py:197:stop] 0/2302, RunningAvgSamplesPerSec=6.326247554449026, CurrSamplesPerSec=5.690969124591848, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:57,778] [INFO] [timer.py:197:stop] 0/2304, RunningAvgSamplesPerSec=6.326248708829878, CurrSamplesPerSec=5.697551509866096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:09,072] [INFO] [timer.py:197:stop] 0/2306, RunningAvgSamplesPerSec=6.326259242853581, CurrSamplesPerSec=5.697326345762504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:20,381] [INFO] [timer.py:197:stop] 0/2308, RunningAvgSamplesPerSec=6.326269961524548, CurrSamplesPerSec=5.69958241847372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:31,664] [INFO] [timer.py:197:stop] 0/2310, RunningAvgSamplesPerSec=6.326284545954589, CurrSamplesPerSec=5.707692153291593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:42,996] [INFO] [timer.py:197:stop] 0/2312, RunningAvgSamplesPerSec=6.326284080838427, CurrSamplesPerSec=5.6587606682746046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:54,323] [INFO] [timer.py:197:stop] 0/2314, RunningAvgSamplesPerSec=6.326284805015643, CurrSamplesPerSec=5.685244668107953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:05,615] [INFO] [timer.py:197:stop] 0/2316, RunningAvgSamplesPerSec=6.326290782416112, CurrSamplesPerSec=5.684297930523931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:16,889] [INFO] [timer.py:197:stop] 0/2318, RunningAvgSamplesPerSec=6.326316142943657, CurrSamplesPerSec=5.723674224372936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:28,266] [INFO] [logging.py:68:log_dist] [Rank 0] step=1160, skipped=5, lr=[8.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 23:58:28,267] [INFO] [timer.py:197:stop] 0/2320, RunningAvgSamplesPerSec=6.326323787028186, CurrSamplesPerSec=5.688317476381979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:39,623] [INFO] [timer.py:197:stop] 0/2322, RunningAvgSamplesPerSec=6.326342023787668, CurrSamplesPerSec=5.71290112985152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:50,962] [INFO] [timer.py:197:stop] 0/2324, RunningAvgSamplesPerSec=6.326356330123207, CurrSamplesPerSec=5.708964548907816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:02,276] [INFO] [timer.py:197:stop] 0/2326, RunningAvgSamplesPerSec=6.326361865091982, CurrSamplesPerSec=5.6786080227710025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:13,775] [INFO] [timer.py:197:stop] 0/2328, RunningAvgSamplesPerSec=6.326370235319124, CurrSamplesPerSec=5.678698840919953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:25,300] [INFO] [timer.py:197:stop] 0/2330, RunningAvgSamplesPerSec=6.326394180628012, CurrSamplesPerSec=5.7203590393708605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:36,883] [INFO] [timer.py:197:stop] 0/2332, RunningAvgSamplesPerSec=6.32626253016581, CurrSamplesPerSec=5.445651602800221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:48,388] [INFO] [timer.py:197:stop] 0/2334, RunningAvgSamplesPerSec=6.326292110693107, CurrSamplesPerSec=5.704136048880114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:59,841] [INFO] [timer.py:197:stop] 0/2336, RunningAvgSamplesPerSec=6.32631299484464, CurrSamplesPerSec=5.711764553181547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:11,419] [INFO] [timer.py:197:stop] 0/2338, RunningAvgSamplesPerSec=6.326181513763306, CurrSamplesPerSec=5.415030591285196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:22,709] [INFO] [logging.py:68:log_dist] [Rank 0] step=1170, skipped=5, lr=[8.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 00:00:22,711] [INFO] [timer.py:197:stop] 0/2340, RunningAvgSamplesPerSec=6.326202325976543, CurrSamplesPerSec=5.74276970829813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:34,006] [INFO] [timer.py:197:stop] 0/2342, RunningAvgSamplesPerSec=6.32620579711919, CurrSamplesPerSec=5.721006406084096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:45,502] [INFO] [timer.py:197:stop] 0/2344, RunningAvgSamplesPerSec=6.326118889510103, CurrSamplesPerSec=5.52402753944322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:56,800] [INFO] [timer.py:197:stop] 0/2346, RunningAvgSamplesPerSec=6.326135872170397, CurrSamplesPerSec=5.710939936974358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:08,272] [INFO] [timer.py:197:stop] 0/2348, RunningAvgSamplesPerSec=6.326128201303538, CurrSamplesPerSec=5.6852942769906845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:19,855] [INFO] [timer.py:197:stop] 0/2350, RunningAvgSamplesPerSec=6.3260915621697915, CurrSamplesPerSec=5.697348595343359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0114, 'learning_rate': 8.513333333333335e-06, 'epoch': 4.98} [2022-12-17 00:01:31,193] [INFO] [timer.py:197:stop] 0/2352, RunningAvgSamplesPerSec=6.326086616929433, CurrSamplesPerSec=5.684411320157631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:42,756] [INFO] [timer.py:197:stop] 0/2354, RunningAvgSamplesPerSec=6.326080971833527, CurrSamplesPerSec=5.689428581287169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:54,292] [INFO] [timer.py:197:stop] 0/2356, RunningAvgSamplesPerSec=6.3260681795923155, CurrSamplesPerSec=5.702536278019622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:05,644] [INFO] [timer.py:197:stop] 0/2358, RunningAvgSamplesPerSec=6.326060427681191, CurrSamplesPerSec=5.680715367471306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:14,121] [INFO] [logging.py:68:log_dist] [Rank 0] step=1180, skipped=5, lr=[8.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:02:14,123] [INFO] [timer.py:197:stop] 0/2360, RunningAvgSamplesPerSec=6.327393469676367, CurrSamplesPerSec=10.255969396928553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:25,787] [INFO] [timer.py:197:stop] 0/2362, RunningAvgSamplesPerSec=6.327398217095025, CurrSamplesPerSec=5.707732202892349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:37,136] [INFO] [timer.py:197:stop] 0/2364, RunningAvgSamplesPerSec=6.327384932114008, CurrSamplesPerSec=5.666322510482579, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:48,647] [INFO] [timer.py:197:stop] 0/2366, RunningAvgSamplesPerSec=6.327379114654407, CurrSamplesPerSec=5.681089748134514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:00,366] [INFO] [timer.py:197:stop] 0/2368, RunningAvgSamplesPerSec=6.327323722516558, CurrSamplesPerSec=5.693677831984537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:11,697] [INFO] [timer.py:197:stop] 0/2370, RunningAvgSamplesPerSec=6.327321020826525, CurrSamplesPerSec=5.679773983317212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:23,253] [INFO] [timer.py:197:stop] 0/2372, RunningAvgSamplesPerSec=6.327311747242629, CurrSamplesPerSec=5.681938478914834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:34,741] [INFO] [timer.py:197:stop] 0/2374, RunningAvgSamplesPerSec=6.327319351743848, CurrSamplesPerSec=5.691096535385634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:46,059] [INFO] [timer.py:197:stop] 0/2376, RunningAvgSamplesPerSec=6.32732351833207, CurrSamplesPerSec=5.700592362818308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:57,611] [INFO] [timer.py:197:stop] 0/2378, RunningAvgSamplesPerSec=6.327352322513417, CurrSamplesPerSec=5.712766175697562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:09,069] [INFO] [logging.py:68:log_dist] [Rank 0] step=1190, skipped=5, lr=[8.48e-06], mom=[[0.9, 0.999]] [2022-12-17 00:04:09,071] [INFO] [timer.py:197:stop] 0/2380, RunningAvgSamplesPerSec=6.327346203150038, CurrSamplesPerSec=5.69740954073738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:20,405] [INFO] [timer.py:197:stop] 0/2382, RunningAvgSamplesPerSec=6.327342004866964, CurrSamplesPerSec=5.68034271896564, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:31,873] [INFO] [timer.py:197:stop] 0/2384, RunningAvgSamplesPerSec=6.327349613345468, CurrSamplesPerSec=5.699608316218365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:43,357] [INFO] [timer.py:197:stop] 0/2386, RunningAvgSamplesPerSec=6.327357594697813, CurrSamplesPerSec=5.7072145141063535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:54,649] [INFO] [timer.py:197:stop] 0/2388, RunningAvgSamplesPerSec=6.327373490088198, CurrSamplesPerSec=5.7044960664789475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:06,025] [INFO] [timer.py:197:stop] 0/2390, RunningAvgSamplesPerSec=6.327346305357408, CurrSamplesPerSec=5.649194330102887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:17,540] [INFO] [timer.py:197:stop] 0/2392, RunningAvgSamplesPerSec=6.327245483140265, CurrSamplesPerSec=5.660994413395568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:28,835] [INFO] [timer.py:197:stop] 0/2394, RunningAvgSamplesPerSec=6.327261384200125, CurrSamplesPerSec=5.714668036756897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:40,427] [INFO] [timer.py:197:stop] 0/2396, RunningAvgSamplesPerSec=6.327119563889244, CurrSamplesPerSec=5.436875578629744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:51,965] [INFO] [timer.py:197:stop] 0/2398, RunningAvgSamplesPerSec=6.327112445362932, CurrSamplesPerSec=5.679763167386114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:03,374] [INFO] [logging.py:68:log_dist] [Rank 0] step=1200, skipped=5, lr=[8.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:06:03,376] [INFO] [timer.py:197:stop] 0/2400, RunningAvgSamplesPerSec=6.327118049781438, CurrSamplesPerSec=5.7093232333914035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0097, 'learning_rate': 8.457777777777778e-06, 'epoch': 5.08} [2022-12-17 00:06:15,117] [INFO] [timer.py:197:stop] 0/2402, RunningAvgSamplesPerSec=6.326904179662763, CurrSamplesPerSec=5.29673078600285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:26,595] [INFO] [timer.py:197:stop] 0/2404, RunningAvgSamplesPerSec=6.3268834712487205, CurrSamplesPerSec=5.646651775835152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:38,108] [INFO] [timer.py:197:stop] 0/2406, RunningAvgSamplesPerSec=6.326888160520098, CurrSamplesPerSec=5.690182585796365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:49,777] [INFO] [timer.py:197:stop] 0/2408, RunningAvgSamplesPerSec=6.326730764415, CurrSamplesPerSec=5.387032140158966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:01,299] [INFO] [timer.py:197:stop] 0/2410, RunningAvgSamplesPerSec=6.326726621316895, CurrSamplesPerSec=5.6720043517375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:12,693] [INFO] [timer.py:197:stop] 0/2412, RunningAvgSamplesPerSec=6.326735800305985, CurrSamplesPerSec=5.697340614495457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:24,145] [INFO] [timer.py:197:stop] 0/2414, RunningAvgSamplesPerSec=6.326685750989741, CurrSamplesPerSec=5.598387443582945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:35,424] [INFO] [timer.py:197:stop] 0/2416, RunningAvgSamplesPerSec=6.326691411324269, CurrSamplesPerSec=5.688779660662067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:46,756] [INFO] [timer.py:197:stop] 0/2418, RunningAvgSamplesPerSec=6.326693198828202, CurrSamplesPerSec=5.704795752146189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:58,143] [INFO] [logging.py:68:log_dist] [Rank 0] step=1210, skipped=5, lr=[8.435555555555555e-06], mom=[[0.9, 0.999]] [2022-12-17 00:07:58,145] [INFO] [timer.py:197:stop] 0/2420, RunningAvgSamplesPerSec=6.3266655254786315, CurrSamplesPerSec=5.633763495373306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:09,440] [INFO] [timer.py:197:stop] 0/2422, RunningAvgSamplesPerSec=6.326681867189228, CurrSamplesPerSec=5.711533646362335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:20,823] [INFO] [timer.py:197:stop] 0/2424, RunningAvgSamplesPerSec=6.326712186913518, CurrSamplesPerSec=5.7466206883559385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:32,438] [INFO] [timer.py:197:stop] 0/2426, RunningAvgSamplesPerSec=6.326729864602853, CurrSamplesPerSec=5.731897997137077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:43,746] [INFO] [timer.py:197:stop] 0/2428, RunningAvgSamplesPerSec=6.326729675312963, CurrSamplesPerSec=5.687243194186456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:55,189] [INFO] [timer.py:197:stop] 0/2430, RunningAvgSamplesPerSec=6.326737809957682, CurrSamplesPerSec=5.701821627330498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:06,855] [INFO] [timer.py:197:stop] 0/2432, RunningAvgSamplesPerSec=6.326733080994904, CurrSamplesPerSec=5.672367276741743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:18,164] [INFO] [timer.py:197:stop] 0/2434, RunningAvgSamplesPerSec=6.326742221811101, CurrSamplesPerSec=5.707480991963551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:29,558] [INFO] [timer.py:197:stop] 0/2436, RunningAvgSamplesPerSec=6.326750724338014, CurrSamplesPerSec=5.690528539573331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:40,932] [INFO] [timer.py:197:stop] 0/2438, RunningAvgSamplesPerSec=6.326724669497559, CurrSamplesPerSec=5.690591751851742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:52,213] [INFO] [logging.py:68:log_dist] [Rank 0] step=1220, skipped=5, lr=[8.413333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 00:09:52,214] [INFO] [timer.py:197:stop] 0/2440, RunningAvgSamplesPerSec=6.326738501522213, CurrSamplesPerSec=5.702198068450731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:03,515] [INFO] [timer.py:197:stop] 0/2442, RunningAvgSamplesPerSec=6.326750841170004, CurrSamplesPerSec=5.698653885705367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:14,860] [INFO] [timer.py:197:stop] 0/2444, RunningAvgSamplesPerSec=6.326740054037859, CurrSamplesPerSec=5.718250201156127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:26,173] [INFO] [timer.py:197:stop] 0/2446, RunningAvgSamplesPerSec=6.3267458592431325, CurrSamplesPerSec=5.7044698818484925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:37,439] [INFO] [timer.py:197:stop] 0/2448, RunningAvgSamplesPerSec=6.326768357244894, CurrSamplesPerSec=5.727955490702307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:48,891] [INFO] [timer.py:197:stop] 0/2450, RunningAvgSamplesPerSec=6.32674730298158, CurrSamplesPerSec=5.709237018720549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0075, 'learning_rate': 8.402222222222223e-06, 'epoch': 5.19} [2022-12-17 00:11:00,220] [INFO] [timer.py:197:stop] 0/2452, RunningAvgSamplesPerSec=6.326746408929464, CurrSamplesPerSec=5.675316745076753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:11,568] [INFO] [timer.py:197:stop] 0/2454, RunningAvgSamplesPerSec=6.326751113377327, CurrSamplesPerSec=5.6697918498087185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:22,921] [INFO] [timer.py:197:stop] 0/2456, RunningAvgSamplesPerSec=6.326768920915674, CurrSamplesPerSec=5.7107800478084885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:34,261] [INFO] [timer.py:197:stop] 0/2458, RunningAvgSamplesPerSec=6.326777253114961, CurrSamplesPerSec=5.693119946167675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:45,714] [INFO] [logging.py:68:log_dist] [Rank 0] step=1230, skipped=5, lr=[8.391111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 00:11:45,728] [INFO] [timer.py:197:stop] 0/2460, RunningAvgSamplesPerSec=6.326722606926644, CurrSamplesPerSec=5.568440852071338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:57,328] [INFO] [timer.py:197:stop] 0/2462, RunningAvgSamplesPerSec=6.326719706587058, CurrSamplesPerSec=5.700025616999359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:08,882] [INFO] [timer.py:197:stop] 0/2464, RunningAvgSamplesPerSec=6.326736640937212, CurrSamplesPerSec=5.725507396596043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:20,388] [INFO] [timer.py:197:stop] 0/2466, RunningAvgSamplesPerSec=6.326662702925947, CurrSamplesPerSec=5.535536232496128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:31,932] [INFO] [timer.py:197:stop] 0/2468, RunningAvgSamplesPerSec=6.32667441348783, CurrSamplesPerSec=5.713611263948235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:43,315] [INFO] [timer.py:197:stop] 0/2470, RunningAvgSamplesPerSec=6.326702594844601, CurrSamplesPerSec=5.738342133171381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:54,683] [INFO] [timer.py:197:stop] 0/2472, RunningAvgSamplesPerSec=6.3266968094008575, CurrSamplesPerSec=5.688689725337002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:06,234] [INFO] [timer.py:197:stop] 0/2474, RunningAvgSamplesPerSec=6.326702223524954, CurrSamplesPerSec=5.712893834869376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:17,816] [INFO] [timer.py:197:stop] 0/2476, RunningAvgSamplesPerSec=6.326706690468993, CurrSamplesPerSec=5.693369169956182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:29,361] [INFO] [timer.py:197:stop] 0/2478, RunningAvgSamplesPerSec=6.326604120041126, CurrSamplesPerSec=5.5015015024862315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:40,781] [INFO] [logging.py:68:log_dist] [Rank 0] step=1240, skipped=5, lr=[8.36888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 00:13:40,783] [INFO] [timer.py:197:stop] 0/2480, RunningAvgSamplesPerSec=6.326613241704568, CurrSamplesPerSec=5.703245776278132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:52,225] [INFO] [timer.py:197:stop] 0/2482, RunningAvgSamplesPerSec=6.326629215160153, CurrSamplesPerSec=5.705985834359169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:03,577] [INFO] [timer.py:197:stop] 0/2484, RunningAvgSamplesPerSec=6.3266315522699, CurrSamplesPerSec=5.684307800779385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:15,106] [INFO] [timer.py:197:stop] 0/2486, RunningAvgSamplesPerSec=6.326633090727117, CurrSamplesPerSec=5.702784146178658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:26,547] [INFO] [timer.py:197:stop] 0/2488, RunningAvgSamplesPerSec=6.326633770679098, CurrSamplesPerSec=5.690413940741107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:37,978] [INFO] [timer.py:197:stop] 0/2490, RunningAvgSamplesPerSec=6.326596239594797, CurrSamplesPerSec=5.607375858672685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:49,487] [INFO] [timer.py:197:stop] 0/2492, RunningAvgSamplesPerSec=6.326614834209527, CurrSamplesPerSec=5.7160601480076645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:00,811] [INFO] [timer.py:197:stop] 0/2494, RunningAvgSamplesPerSec=6.326637863759786, CurrSamplesPerSec=5.725856193596088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:12,347] [INFO] [timer.py:197:stop] 0/2496, RunningAvgSamplesPerSec=6.3265480474100535, CurrSamplesPerSec=5.50056176595693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:23,670] [INFO] [timer.py:197:stop] 0/2498, RunningAvgSamplesPerSec=6.326564531801721, CurrSamplesPerSec=5.718744552837693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:35,029] [INFO] [logging.py:68:log_dist] [Rank 0] step=1250, skipped=5, lr=[8.346666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 00:15:35,030] [INFO] [timer.py:197:stop] 0/2500, RunningAvgSamplesPerSec=6.326555577673725, CurrSamplesPerSec=5.695495462388746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0063, 'learning_rate': 8.346666666666668e-06, 'epoch': 5.3} [2022-12-17 00:15:46,532] [INFO] [timer.py:197:stop] 0/2502, RunningAvgSamplesPerSec=6.326484923630128, CurrSamplesPerSec=5.53508263380697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:57,765] [INFO] [timer.py:197:stop] 0/2504, RunningAvgSamplesPerSec=6.3265099226472925, CurrSamplesPerSec=5.736485042851482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:09,067] [INFO] [timer.py:197:stop] 0/2506, RunningAvgSamplesPerSec=6.326523949412188, CurrSamplesPerSec=5.719076686267754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:20,454] [INFO] [timer.py:197:stop] 0/2508, RunningAvgSamplesPerSec=6.326533116804953, CurrSamplesPerSec=5.689745740535148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:31,727] [INFO] [timer.py:197:stop] 0/2510, RunningAvgSamplesPerSec=6.326552905082487, CurrSamplesPerSec=5.716908159224934, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:43,073] [INFO] [timer.py:197:stop] 0/2512, RunningAvgSamplesPerSec=6.326582937970287, CurrSamplesPerSec=5.723374992606862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:54,613] [INFO] [timer.py:197:stop] 0/2514, RunningAvgSamplesPerSec=6.326550641893863, CurrSamplesPerSec=5.66221191290154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:05,946] [INFO] [timer.py:197:stop] 0/2516, RunningAvgSamplesPerSec=6.326549027844262, CurrSamplesPerSec=5.673992379000022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:17,284] [INFO] [timer.py:197:stop] 0/2518, RunningAvgSamplesPerSec=6.326558787750363, CurrSamplesPerSec=5.698325087774245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:28,836] [INFO] [logging.py:68:log_dist] [Rank 0] step=1260, skipped=5, lr=[8.324444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 00:17:28,838] [INFO] [timer.py:197:stop] 0/2520, RunningAvgSamplesPerSec=6.326540122489599, CurrSamplesPerSec=5.690654482982018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:40,139] [INFO] [timer.py:197:stop] 0/2522, RunningAvgSamplesPerSec=6.326553458397724, CurrSamplesPerSec=5.701011744954114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:51,655] [INFO] [timer.py:197:stop] 0/2524, RunningAvgSamplesPerSec=6.3265404713512865, CurrSamplesPerSec=5.667898194852396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:03,195] [INFO] [timer.py:197:stop] 0/2526, RunningAvgSamplesPerSec=6.326507994869215, CurrSamplesPerSec=5.68231398309392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:14,498] [INFO] [timer.py:197:stop] 0/2528, RunningAvgSamplesPerSec=6.326522367660754, CurrSamplesPerSec=5.688211403993173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:25,817] [INFO] [timer.py:197:stop] 0/2530, RunningAvgSamplesPerSec=6.32653360591826, CurrSamplesPerSec=5.712521815482252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:37,184] [INFO] [timer.py:197:stop] 0/2532, RunningAvgSamplesPerSec=6.326514255985787, CurrSamplesPerSec=5.717885522347397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:48,479] [INFO] [timer.py:197:stop] 0/2534, RunningAvgSamplesPerSec=6.3265339901958315, CurrSamplesPerSec=5.713119501609359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:59,747] [INFO] [timer.py:197:stop] 0/2536, RunningAvgSamplesPerSec=6.326563020917872, CurrSamplesPerSec=5.7426670009695355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:11,112] [INFO] [timer.py:197:stop] 0/2538, RunningAvgSamplesPerSec=6.326545038393981, CurrSamplesPerSec=5.721956142352332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:22,447] [INFO] [logging.py:68:log_dist] [Rank 0] step=1270, skipped=5, lr=[8.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:19:22,449] [INFO] [timer.py:197:stop] 0/2540, RunningAvgSamplesPerSec=6.326540286848705, CurrSamplesPerSec=5.686737648285296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:33,717] [INFO] [timer.py:197:stop] 0/2542, RunningAvgSamplesPerSec=6.326562698910825, CurrSamplesPerSec=5.721906135533029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:45,369] [INFO] [timer.py:197:stop] 0/2544, RunningAvgSamplesPerSec=6.326403229462973, CurrSamplesPerSec=5.661712001273591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:56,656] [INFO] [timer.py:197:stop] 0/2546, RunningAvgSamplesPerSec=6.32642389843164, CurrSamplesPerSec=5.701724981548307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:07,953] [INFO] [timer.py:197:stop] 0/2548, RunningAvgSamplesPerSec=6.326438972781323, CurrSamplesPerSec=5.698616866824313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:19,484] [INFO] [timer.py:197:stop] 0/2550, RunningAvgSamplesPerSec=6.326450768126824, CurrSamplesPerSec=5.7128675730879115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0063, 'learning_rate': 8.291111111111112e-06, 'epoch': 5.4} [2022-12-17 00:20:30,772] [INFO] [timer.py:197:stop] 0/2552, RunningAvgSamplesPerSec=6.3264719429827245, CurrSamplesPerSec=5.7136346138201715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:42,057] [INFO] [timer.py:197:stop] 0/2554, RunningAvgSamplesPerSec=6.326493149760229, CurrSamplesPerSec=5.718664875793541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:53,716] [INFO] [timer.py:197:stop] 0/2556, RunningAvgSamplesPerSec=6.326505639720777, CurrSamplesPerSec=5.708368460113684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:05,123] [INFO] [timer.py:197:stop] 0/2558, RunningAvgSamplesPerSec=6.3265038209555655, CurrSamplesPerSec=5.665879514564959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:16,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=1280, skipped=5, lr=[8.28e-06], mom=[[0.9, 0.999]] [2022-12-17 00:21:16,752] [INFO] [timer.py:197:stop] 0/2560, RunningAvgSamplesPerSec=6.326358618419738, CurrSamplesPerSec=5.413214403540145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:28,250] [INFO] [timer.py:197:stop] 0/2562, RunningAvgSamplesPerSec=6.326359770774308, CurrSamplesPerSec=5.694195483572104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:39,600] [INFO] [timer.py:197:stop] 0/2564, RunningAvgSamplesPerSec=6.326349992478936, CurrSamplesPerSec=5.6756863343254045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:50,969] [INFO] [timer.py:197:stop] 0/2566, RunningAvgSamplesPerSec=6.326319567798299, CurrSamplesPerSec=5.643538377656659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:02,666] [INFO] [timer.py:197:stop] 0/2568, RunningAvgSamplesPerSec=6.326323086683221, CurrSamplesPerSec=5.697887233328356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:14,206] [INFO] [timer.py:197:stop] 0/2570, RunningAvgSamplesPerSec=6.326330355222502, CurrSamplesPerSec=5.7042182307129705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:25,738] [INFO] [timer.py:197:stop] 0/2572, RunningAvgSamplesPerSec=6.326225319823644, CurrSamplesPerSec=5.473016049069588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:37,126] [INFO] [timer.py:197:stop] 0/2574, RunningAvgSamplesPerSec=6.326225277051989, CurrSamplesPerSec=5.6848013572597695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:48,440] [INFO] [timer.py:197:stop] 0/2576, RunningAvgSamplesPerSec=6.326231527655515, CurrSamplesPerSec=5.674560437142187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:59,916] [INFO] [timer.py:197:stop] 0/2578, RunningAvgSamplesPerSec=6.326152255092386, CurrSamplesPerSec=5.522841235953951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:11,199] [INFO] [logging.py:68:log_dist] [Rank 0] step=1290, skipped=5, lr=[8.25777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:23:11,201] [INFO] [timer.py:197:stop] 0/2580, RunningAvgSamplesPerSec=6.326175072075225, CurrSamplesPerSec=5.720998846531288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:22,498] [INFO] [timer.py:197:stop] 0/2582, RunningAvgSamplesPerSec=6.326183771322497, CurrSamplesPerSec=5.698068414658625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:33,927] [INFO] [timer.py:197:stop] 0/2584, RunningAvgSamplesPerSec=6.326139785337271, CurrSamplesPerSec=5.58557952242445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:45,167] [INFO] [timer.py:197:stop] 0/2586, RunningAvgSamplesPerSec=6.326179241497298, CurrSamplesPerSec=5.7387481945854235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:56,451] [INFO] [timer.py:197:stop] 0/2588, RunningAvgSamplesPerSec=6.32620099438922, CurrSamplesPerSec=5.721770755603695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:07,781] [INFO] [timer.py:197:stop] 0/2590, RunningAvgSamplesPerSec=6.326192997876933, CurrSamplesPerSec=5.647460068432117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:19,159] [INFO] [timer.py:197:stop] 0/2592, RunningAvgSamplesPerSec=6.326204338875871, CurrSamplesPerSec=5.710935805986585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:30,481] [INFO] [timer.py:197:stop] 0/2594, RunningAvgSamplesPerSec=6.326213224289392, CurrSamplesPerSec=5.6991071035421985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:41,792] [INFO] [timer.py:197:stop] 0/2596, RunningAvgSamplesPerSec=6.326214609800813, CurrSamplesPerSec=5.689834744426627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:53,083] [INFO] [timer.py:197:stop] 0/2598, RunningAvgSamplesPerSec=6.3262259837086505, CurrSamplesPerSec=5.7008393355908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:04,381] [INFO] [logging.py:68:log_dist] [Rank 0] step=1300, skipped=5, lr=[8.235555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 00:25:04,383] [INFO] [timer.py:197:stop] 0/2600, RunningAvgSamplesPerSec=6.3262398954655445, CurrSamplesPerSec=5.711441774946321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0057, 'learning_rate': 8.235555555555557e-06, 'epoch': 5.51} [2022-12-17 00:25:15,742] [INFO] [timer.py:197:stop] 0/2602, RunningAvgSamplesPerSec=6.326244054453919, CurrSamplesPerSec=5.675839944017053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:27,029] [INFO] [timer.py:197:stop] 0/2604, RunningAvgSamplesPerSec=6.326264010208493, CurrSamplesPerSec=5.715521719669567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:38,582] [INFO] [timer.py:197:stop] 0/2606, RunningAvgSamplesPerSec=6.326260293048078, CurrSamplesPerSec=5.690133374032339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:50,225] [INFO] [timer.py:197:stop] 0/2608, RunningAvgSamplesPerSec=6.3262267886673405, CurrSamplesPerSec=5.683333218298441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:01,512] [INFO] [timer.py:197:stop] 0/2610, RunningAvgSamplesPerSec=6.326239613463648, CurrSamplesPerSec=5.710197913057936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:12,880] [INFO] [timer.py:197:stop] 0/2612, RunningAvgSamplesPerSec=6.32625914149536, CurrSamplesPerSec=5.713643370071351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:24,506] [INFO] [timer.py:197:stop] 0/2614, RunningAvgSamplesPerSec=6.32611627280062, CurrSamplesPerSec=5.696622671515938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:35,814] [INFO] [timer.py:197:stop] 0/2616, RunningAvgSamplesPerSec=6.326125500638407, CurrSamplesPerSec=5.70426065588973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:47,096] [INFO] [timer.py:197:stop] 0/2618, RunningAvgSamplesPerSec=6.326139626599832, CurrSamplesPerSec=5.702213815140462, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:58,708] [INFO] [logging.py:68:log_dist] [Rank 0] step=1310, skipped=5, lr=[8.213333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 00:26:58,710] [INFO] [timer.py:197:stop] 0/2620, RunningAvgSamplesPerSec=6.326002105283931, CurrSamplesPerSec=5.712053091414867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:10,158] [INFO] [timer.py:197:stop] 0/2622, RunningAvgSamplesPerSec=6.3259629171482885, CurrSamplesPerSec=5.592408362554529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:21,571] [INFO] [timer.py:197:stop] 0/2624, RunningAvgSamplesPerSec=6.325924896719555, CurrSamplesPerSec=5.606229828905596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:33,009] [INFO] [timer.py:197:stop] 0/2626, RunningAvgSamplesPerSec=6.325937791742814, CurrSamplesPerSec=5.702413926803422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:44,296] [INFO] [timer.py:197:stop] 0/2628, RunningAvgSamplesPerSec=6.325958003199458, CurrSamplesPerSec=5.73155556162893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:55,576] [INFO] [timer.py:197:stop] 0/2630, RunningAvgSamplesPerSec=6.325981450593395, CurrSamplesPerSec=5.719802494288624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:06,852] [INFO] [timer.py:197:stop] 0/2632, RunningAvgSamplesPerSec=6.326006891574107, CurrSamplesPerSec=5.729226910935692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:18,152] [INFO] [timer.py:197:stop] 0/2634, RunningAvgSamplesPerSec=6.326020555896606, CurrSamplesPerSec=5.698490086849737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:29,411] [INFO] [timer.py:197:stop] 0/2636, RunningAvgSamplesPerSec=6.326053117175442, CurrSamplesPerSec=5.732383449386664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:40,700] [INFO] [timer.py:197:stop] 0/2638, RunningAvgSamplesPerSec=6.32607484050823, CurrSamplesPerSec=5.717845086546815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:51,958] [INFO] [logging.py:68:log_dist] [Rank 0] step=1320, skipped=5, lr=[8.191111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 00:28:51,959] [INFO] [timer.py:197:stop] 0/2640, RunningAvgSamplesPerSec=6.326099519405758, CurrSamplesPerSec=5.725360855912853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:03,227] [INFO] [timer.py:197:stop] 0/2642, RunningAvgSamplesPerSec=6.326120596379347, CurrSamplesPerSec=5.725664691942976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:14,824] [INFO] [timer.py:197:stop] 0/2644, RunningAvgSamplesPerSec=6.326140929669695, CurrSamplesPerSec=5.704167563757328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:26,095] [INFO] [timer.py:197:stop] 0/2646, RunningAvgSamplesPerSec=6.32616732833622, CurrSamplesPerSec=5.720641863301964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:37,643] [INFO] [timer.py:197:stop] 0/2648, RunningAvgSamplesPerSec=6.326194882392596, CurrSamplesPerSec=5.723545594812338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:48,979] [INFO] [timer.py:197:stop] 0/2650, RunningAvgSamplesPerSec=6.326209225498618, CurrSamplesPerSec=5.724713724280205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0058, 'learning_rate': 8.18e-06, 'epoch': 5.61} [2022-12-17 00:30:00,270] [INFO] [timer.py:197:stop] 0/2652, RunningAvgSamplesPerSec=6.326229768870487, CurrSamplesPerSec=5.710958890994876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:11,729] [INFO] [timer.py:197:stop] 0/2654, RunningAvgSamplesPerSec=6.326249240897963, CurrSamplesPerSec=5.708205558930267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:23,103] [INFO] [timer.py:197:stop] 0/2656, RunningAvgSamplesPerSec=6.326252214933294, CurrSamplesPerSec=5.699375244988747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:34,432] [INFO] [timer.py:197:stop] 0/2658, RunningAvgSamplesPerSec=6.326251581706483, CurrSamplesPerSec=5.664131887462741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:45,723] [INFO] [logging.py:68:log_dist] [Rank 0] step=1330, skipped=5, lr=[8.16888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 00:30:45,725] [INFO] [timer.py:197:stop] 0/2660, RunningAvgSamplesPerSec=6.326267794800439, CurrSamplesPerSec=5.690308755138885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:57,318] [INFO] [timer.py:197:stop] 0/2662, RunningAvgSamplesPerSec=6.326142217190136, CurrSamplesPerSec=5.70153291081011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:08,612] [INFO] [timer.py:197:stop] 0/2664, RunningAvgSamplesPerSec=6.326157916158148, CurrSamplesPerSec=5.708166959267274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:19,918] [INFO] [timer.py:197:stop] 0/2666, RunningAvgSamplesPerSec=6.3261677520262936, CurrSamplesPerSec=5.702544273430368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:31,445] [INFO] [timer.py:197:stop] 0/2668, RunningAvgSamplesPerSec=6.326075930517208, CurrSamplesPerSec=5.737835066469297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:42,765] [INFO] [timer.py:197:stop] 0/2670, RunningAvgSamplesPerSec=6.326079650124289, CurrSamplesPerSec=5.704554255407273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:54,072] [INFO] [timer.py:197:stop] 0/2672, RunningAvgSamplesPerSec=6.326084022141423, CurrSamplesPerSec=5.679570891042092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:05,392] [INFO] [timer.py:197:stop] 0/2674, RunningAvgSamplesPerSec=6.32608739737266, CurrSamplesPerSec=5.69966592139152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:16,734] [INFO] [timer.py:197:stop] 0/2676, RunningAvgSamplesPerSec=6.3260808079396345, CurrSamplesPerSec=5.668778662460667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:28,067] [INFO] [timer.py:197:stop] 0/2678, RunningAvgSamplesPerSec=6.326079861892401, CurrSamplesPerSec=5.675158364418903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:39,429] [INFO] [logging.py:68:log_dist] [Rank 0] step=1340, skipped=5, lr=[8.146666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 00:32:39,431] [INFO] [timer.py:197:stop] 0/2680, RunningAvgSamplesPerSec=6.32606307682349, CurrSamplesPerSec=5.7083980795096085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:50,725] [INFO] [timer.py:197:stop] 0/2682, RunningAvgSamplesPerSec=6.326072935799696, CurrSamplesPerSec=5.679356998906212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:02,471] [INFO] [timer.py:197:stop] 0/2684, RunningAvgSamplesPerSec=6.325883959083556, CurrSamplesPerSec=5.299005576723335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:13,813] [INFO] [timer.py:197:stop] 0/2686, RunningAvgSamplesPerSec=6.325889876937557, CurrSamplesPerSec=5.705633390227677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:25,326] [INFO] [timer.py:197:stop] 0/2688, RunningAvgSamplesPerSec=6.3258965851993825, CurrSamplesPerSec=5.709519715426119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:36,701] [INFO] [timer.py:197:stop] 0/2690, RunningAvgSamplesPerSec=6.325876082499624, CurrSamplesPerSec=5.64687770381663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:48,182] [INFO] [timer.py:197:stop] 0/2692, RunningAvgSamplesPerSec=6.325893590208427, CurrSamplesPerSec=5.724229081879015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:59,565] [INFO] [timer.py:197:stop] 0/2694, RunningAvgSamplesPerSec=6.325900009531055, CurrSamplesPerSec=5.684401208796394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:10,944] [INFO] [timer.py:197:stop] 0/2696, RunningAvgSamplesPerSec=6.325868582660771, CurrSamplesPerSec=5.625398530103999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:22,205] [INFO] [timer.py:197:stop] 0/2698, RunningAvgSamplesPerSec=6.32588495236265, CurrSamplesPerSec=5.71918854325984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:33,514] [INFO] [logging.py:68:log_dist] [Rank 0] step=1350, skipped=5, lr=[8.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 00:34:33,516] [INFO] [timer.py:197:stop] 0/2700, RunningAvgSamplesPerSec=6.325898947710833, CurrSamplesPerSec=5.706381991812721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0065, 'learning_rate': 8.124444444444445e-06, 'epoch': 5.72} [2022-12-17 00:34:44,835] [INFO] [timer.py:197:stop] 0/2702, RunningAvgSamplesPerSec=6.325905253726988, CurrSamplesPerSec=5.686284468334417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:56,149] [INFO] [timer.py:197:stop] 0/2704, RunningAvgSamplesPerSec=6.325912620762956, CurrSamplesPerSec=5.71820001547374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:07,474] [INFO] [timer.py:197:stop] 0/2706, RunningAvgSamplesPerSec=6.325917059353624, CurrSamplesPerSec=5.686473585688181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:18,765] [INFO] [timer.py:197:stop] 0/2708, RunningAvgSamplesPerSec=6.325928300743048, CurrSamplesPerSec=5.692940528208803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:30,079] [INFO] [timer.py:197:stop] 0/2710, RunningAvgSamplesPerSec=6.325935598815497, CurrSamplesPerSec=5.698778737242706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:41,360] [INFO] [timer.py:197:stop] 0/2712, RunningAvgSamplesPerSec=6.325953492621701, CurrSamplesPerSec=5.725917017592284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:52,664] [INFO] [timer.py:197:stop] 0/2714, RunningAvgSamplesPerSec=6.3259665608292295, CurrSamplesPerSec=5.7136144258988875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:03,998] [INFO] [timer.py:197:stop] 0/2716, RunningAvgSamplesPerSec=6.325965339171459, CurrSamplesPerSec=5.69463711984461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:15,336] [INFO] [timer.py:197:stop] 0/2718, RunningAvgSamplesPerSec=6.325963267000056, CurrSamplesPerSec=5.6800157902202875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:26,643] [INFO] [logging.py:68:log_dist] [Rank 0] step=1360, skipped=5, lr=[8.102222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:36:26,644] [INFO] [timer.py:197:stop] 0/2720, RunningAvgSamplesPerSec=6.325974389268866, CurrSamplesPerSec=5.712358434045041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:38,000] [INFO] [timer.py:197:stop] 0/2722, RunningAvgSamplesPerSec=6.325966469616858, CurrSamplesPerSec=5.687886943472245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:49,304] [INFO] [timer.py:197:stop] 0/2724, RunningAvgSamplesPerSec=6.3259805196125285, CurrSamplesPerSec=5.710168274981378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:00,609] [INFO] [timer.py:197:stop] 0/2726, RunningAvgSamplesPerSec=6.325991601930382, CurrSamplesPerSec=5.717981011568095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:11,924] [INFO] [timer.py:197:stop] 0/2728, RunningAvgSamplesPerSec=6.325997268890384, CurrSamplesPerSec=5.687382247117594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:23,234] [INFO] [timer.py:197:stop] 0/2730, RunningAvgSamplesPerSec=6.326005740177389, CurrSamplesPerSec=5.691548793241471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:34,542] [INFO] [timer.py:197:stop] 0/2732, RunningAvgSamplesPerSec=6.326015268521907, CurrSamplesPerSec=5.710145682228034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:45,849] [INFO] [timer.py:197:stop] 0/2734, RunningAvgSamplesPerSec=6.326026052423808, CurrSamplesPerSec=5.69643166967437, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:57,147] [INFO] [timer.py:197:stop] 0/2736, RunningAvgSamplesPerSec=6.326034238434523, CurrSamplesPerSec=5.711207005865852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:08,454] [INFO] [timer.py:197:stop] 0/2738, RunningAvgSamplesPerSec=6.326044722066875, CurrSamplesPerSec=5.714813057261718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:19,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=1370, skipped=5, lr=[8.08e-06], mom=[[0.9, 0.999]] [2022-12-17 00:38:19,746] [INFO] [timer.py:197:stop] 0/2740, RunningAvgSamplesPerSec=6.326062588023212, CurrSamplesPerSec=5.714517427614674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:31,062] [INFO] [timer.py:197:stop] 0/2742, RunningAvgSamplesPerSec=6.326068755109588, CurrSamplesPerSec=5.6926896518027075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:42,377] [INFO] [timer.py:197:stop] 0/2744, RunningAvgSamplesPerSec=6.326077004074519, CurrSamplesPerSec=5.698056319417846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:53,733] [INFO] [timer.py:197:stop] 0/2746, RunningAvgSamplesPerSec=6.326095865876457, CurrSamplesPerSec=5.708167930321577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:05,083] [INFO] [timer.py:197:stop] 0/2748, RunningAvgSamplesPerSec=6.326088331147206, CurrSamplesPerSec=5.6702160544353335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:16,406] [INFO] [timer.py:197:stop] 0/2750, RunningAvgSamplesPerSec=6.32609210532745, CurrSamplesPerSec=5.6996581760903995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0059, 'learning_rate': 8.06888888888889e-06, 'epoch': 5.83} [2022-12-17 00:39:27,683] [INFO] [timer.py:197:stop] 0/2752, RunningAvgSamplesPerSec=6.326102930538578, CurrSamplesPerSec=5.700973726824695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:39,245] [INFO] [timer.py:197:stop] 0/2754, RunningAvgSamplesPerSec=6.326103142679591, CurrSamplesPerSec=5.6921966559580985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:50,546] [INFO] [timer.py:197:stop] 0/2756, RunningAvgSamplesPerSec=6.326109758767559, CurrSamplesPerSec=5.68873505431431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:01,928] [INFO] [timer.py:197:stop] 0/2758, RunningAvgSamplesPerSec=6.326152357749966, CurrSamplesPerSec=5.717524543449726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:13,229] [INFO] [logging.py:68:log_dist] [Rank 0] step=1380, skipped=5, lr=[8.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:40:13,231] [INFO] [timer.py:197:stop] 0/2760, RunningAvgSamplesPerSec=6.3261563606599145, CurrSamplesPerSec=5.686595494752139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:24,553] [INFO] [timer.py:197:stop] 0/2762, RunningAvgSamplesPerSec=6.326160416652262, CurrSamplesPerSec=5.694735458470453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:35,862] [INFO] [timer.py:197:stop] 0/2764, RunningAvgSamplesPerSec=6.326171019397204, CurrSamplesPerSec=5.696176617567328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:47,152] [INFO] [timer.py:197:stop] 0/2766, RunningAvgSamplesPerSec=6.326189476157271, CurrSamplesPerSec=5.714541758117771, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:58,431] [INFO] [timer.py:197:stop] 0/2768, RunningAvgSamplesPerSec=6.3262123829027574, CurrSamplesPerSec=5.732181963257296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:09,717] [INFO] [timer.py:197:stop] 0/2770, RunningAvgSamplesPerSec=6.3262342581050754, CurrSamplesPerSec=5.734106085065293, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:21,023] [INFO] [timer.py:197:stop] 0/2772, RunningAvgSamplesPerSec=6.326247134308609, CurrSamplesPerSec=5.71070205041586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:32,419] [INFO] [timer.py:197:stop] 0/2774, RunningAvgSamplesPerSec=6.326217747933542, CurrSamplesPerSec=5.609474965653818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:43,694] [INFO] [timer.py:197:stop] 0/2776, RunningAvgSamplesPerSec=6.326237779781224, CurrSamplesPerSec=5.714292623567177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:54,989] [INFO] [timer.py:197:stop] 0/2778, RunningAvgSamplesPerSec=6.326246832178511, CurrSamplesPerSec=5.694157073152024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:06,306] [INFO] [logging.py:68:log_dist] [Rank 0] step=1390, skipped=5, lr=[8.035555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 00:42:06,307] [INFO] [timer.py:197:stop] 0/2780, RunningAvgSamplesPerSec=6.326252131073057, CurrSamplesPerSec=5.698359441632837, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:17,627] [INFO] [timer.py:197:stop] 0/2782, RunningAvgSamplesPerSec=6.326263541773177, CurrSamplesPerSec=5.712742832923151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:28,915] [INFO] [timer.py:197:stop] 0/2784, RunningAvgSamplesPerSec=6.3262820242871625, CurrSamplesPerSec=5.711086226394405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:40,235] [INFO] [timer.py:197:stop] 0/2786, RunningAvgSamplesPerSec=6.326278470128036, CurrSamplesPerSec=5.689995633865518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:51,531] [INFO] [timer.py:197:stop] 0/2788, RunningAvgSamplesPerSec=6.3262812600351, CurrSamplesPerSec=5.698626544916717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:02,861] [INFO] [timer.py:197:stop] 0/2790, RunningAvgSamplesPerSec=6.326284751834609, CurrSamplesPerSec=5.682586802008966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:14,152] [INFO] [timer.py:197:stop] 0/2792, RunningAvgSamplesPerSec=6.326300456502428, CurrSamplesPerSec=5.711068486580379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:25,489] [INFO] [timer.py:197:stop] 0/2794, RunningAvgSamplesPerSec=6.326296045859089, CurrSamplesPerSec=5.68929401032924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:36,844] [INFO] [timer.py:197:stop] 0/2796, RunningAvgSamplesPerSec=6.326283444929573, CurrSamplesPerSec=5.678466275380915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:48,194] [INFO] [timer.py:197:stop] 0/2798, RunningAvgSamplesPerSec=6.326275113134897, CurrSamplesPerSec=5.681810996789058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:59,488] [INFO] [logging.py:68:log_dist] [Rank 0] step=1400, skipped=5, lr=[8.013333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 00:43:59,490] [INFO] [timer.py:197:stop] 0/2800, RunningAvgSamplesPerSec=6.326278524053802, CurrSamplesPerSec=5.708140012642158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0061, 'learning_rate': 8.013333333333333e-06, 'epoch': 5.93} [2022-12-17 00:44:10,818] [INFO] [timer.py:197:stop] 0/2802, RunningAvgSamplesPerSec=6.326272564841731, CurrSamplesPerSec=5.689947872632715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:22,144] [INFO] [timer.py:197:stop] 0/2804, RunningAvgSamplesPerSec=6.3262738518884944, CurrSamplesPerSec=5.6964355379390215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:33,451] [INFO] [timer.py:197:stop] 0/2806, RunningAvgSamplesPerSec=6.326284669686423, CurrSamplesPerSec=5.71170111271115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:44,774] [INFO] [timer.py:197:stop] 0/2808, RunningAvgSamplesPerSec=6.326287380248273, CurrSamplesPerSec=5.694567777450047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:56,077] [INFO] [timer.py:197:stop] 0/2810, RunningAvgSamplesPerSec=6.326298963116309, CurrSamplesPerSec=5.7054769510047585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:07,384] [INFO] [timer.py:197:stop] 0/2812, RunningAvgSamplesPerSec=6.326309438125834, CurrSamplesPerSec=5.707904058468435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:18,694] [INFO] [timer.py:197:stop] 0/2814, RunningAvgSamplesPerSec=6.326318695543646, CurrSamplesPerSec=5.699163488592521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:30,016] [INFO] [timer.py:197:stop] 0/2816, RunningAvgSamplesPerSec=6.326308895338787, CurrSamplesPerSec=5.6812999230670735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:41,403] [INFO] [timer.py:197:stop] 0/2818, RunningAvgSamplesPerSec=6.3263158441663165, CurrSamplesPerSec=5.697318123135298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:52,683] [INFO] [logging.py:68:log_dist] [Rank 0] step=1410, skipped=5, lr=[7.991111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 00:45:52,684] [INFO] [timer.py:197:stop] 0/2820, RunningAvgSamplesPerSec=6.326330930150586, CurrSamplesPerSec=5.711552361272398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:04,105] [INFO] [timer.py:197:stop] 0/2822, RunningAvgSamplesPerSec=6.326337358566065, CurrSamplesPerSec=5.7063317716262025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:15,409] [INFO] [timer.py:197:stop] 0/2824, RunningAvgSamplesPerSec=6.326350499991001, CurrSamplesPerSec=5.700049824256106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:26,712] [INFO] [timer.py:197:stop] 0/2826, RunningAvgSamplesPerSec=6.326362116471989, CurrSamplesPerSec=5.696192331017952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:37,981] [INFO] [timer.py:197:stop] 0/2828, RunningAvgSamplesPerSec=6.326378782489265, CurrSamplesPerSec=5.72583909471339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:49,286] [INFO] [timer.py:197:stop] 0/2830, RunningAvgSamplesPerSec=6.3263905912443725, CurrSamplesPerSec=5.712548803494018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:57,752] [INFO] [timer.py:197:stop] 0/2832, RunningAvgSamplesPerSec=6.327500906774722, CurrSamplesPerSec=10.213131621964372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:09,028] [INFO] [timer.py:197:stop] 0/2834, RunningAvgSamplesPerSec=6.327524589867564, CurrSamplesPerSec=5.7415259191163495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:20,330] [INFO] [timer.py:197:stop] 0/2836, RunningAvgSamplesPerSec=6.32753757473291, CurrSamplesPerSec=5.706063945559606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:31,620] [INFO] [timer.py:197:stop] 0/2838, RunningAvgSamplesPerSec=6.32755530949647, CurrSamplesPerSec=5.708585999887545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:42,882] [INFO] [logging.py:68:log_dist] [Rank 0] step=1420, skipped=5, lr=[7.968888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 00:47:42,884] [INFO] [timer.py:197:stop] 0/2840, RunningAvgSamplesPerSec=6.3275774644451, CurrSamplesPerSec=5.717691143431064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:54,164] [INFO] [timer.py:197:stop] 0/2842, RunningAvgSamplesPerSec=6.327600320149804, CurrSamplesPerSec=5.730524830517137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:05,413] [INFO] [timer.py:197:stop] 0/2844, RunningAvgSamplesPerSec=6.327630379247216, CurrSamplesPerSec=5.741550480149671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:16,683] [INFO] [timer.py:197:stop] 0/2846, RunningAvgSamplesPerSec=6.327656478388394, CurrSamplesPerSec=5.728110231499037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:27,968] [INFO] [timer.py:197:stop] 0/2848, RunningAvgSamplesPerSec=6.327676846505894, CurrSamplesPerSec=5.711667570043478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:39,248] [INFO] [timer.py:197:stop] 0/2850, RunningAvgSamplesPerSec=6.327698323490243, CurrSamplesPerSec=5.715694531454376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0069, 'learning_rate': 7.957777777777779e-06, 'epoch': 6.04} [2022-12-17 00:48:50,688] [INFO] [timer.py:197:stop] 0/2852, RunningAvgSamplesPerSec=6.327681825148393, CurrSamplesPerSec=5.633257481872813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:01,947] [INFO] [timer.py:197:stop] 0/2854, RunningAvgSamplesPerSec=6.327705434093166, CurrSamplesPerSec=5.713875177257864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:13,201] [INFO] [timer.py:197:stop] 0/2856, RunningAvgSamplesPerSec=6.327737753470916, CurrSamplesPerSec=5.747293209022641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:24,470] [INFO] [timer.py:197:stop] 0/2858, RunningAvgSamplesPerSec=6.327763769334866, CurrSamplesPerSec=5.725460258524562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:35,694] [INFO] [logging.py:68:log_dist] [Rank 0] step=1430, skipped=5, lr=[7.946666666666666e-06], mom=[[0.9, 0.999]] [2022-12-17 00:49:35,696] [INFO] [timer.py:197:stop] 0/2860, RunningAvgSamplesPerSec=6.327800843423795, CurrSamplesPerSec=5.735983206986001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:46,948] [INFO] [timer.py:197:stop] 0/2862, RunningAvgSamplesPerSec=6.327833495419585, CurrSamplesPerSec=5.748474746050126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:58,260] [INFO] [timer.py:197:stop] 0/2864, RunningAvgSamplesPerSec=6.327840557961029, CurrSamplesPerSec=5.688244430653757, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:09,571] [INFO] [timer.py:197:stop] 0/2866, RunningAvgSamplesPerSec=6.327847886071225, CurrSamplesPerSec=5.6982658162456055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:20,879] [INFO] [timer.py:197:stop] 0/2868, RunningAvgSamplesPerSec=6.327856789431566, CurrSamplesPerSec=5.690743032739117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:32,132] [INFO] [timer.py:197:stop] 0/2870, RunningAvgSamplesPerSec=6.327882540236703, CurrSamplesPerSec=5.7294487337776525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:43,407] [INFO] [timer.py:197:stop] 0/2872, RunningAvgSamplesPerSec=6.327907951304773, CurrSamplesPerSec=5.717148998402179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:54,684] [INFO] [timer.py:197:stop] 0/2874, RunningAvgSamplesPerSec=6.327916338895959, CurrSamplesPerSec=5.7034958874174935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:05,993] [INFO] [timer.py:197:stop] 0/2876, RunningAvgSamplesPerSec=6.327923831231903, CurrSamplesPerSec=5.691680333164429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:17,290] [INFO] [timer.py:197:stop] 0/2878, RunningAvgSamplesPerSec=6.32793017975812, CurrSamplesPerSec=5.69989199664201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:28,548] [INFO] [logging.py:68:log_dist] [Rank 0] step=1440, skipped=5, lr=[7.924444444444444e-06], mom=[[0.9, 0.999]] [2022-12-17 00:51:28,549] [INFO] [timer.py:197:stop] 0/2880, RunningAvgSamplesPerSec=6.327947443597498, CurrSamplesPerSec=5.707494340780261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:39,887] [INFO] [timer.py:197:stop] 0/2882, RunningAvgSamplesPerSec=6.327938244011372, CurrSamplesPerSec=5.671862933828582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:51,197] [INFO] [timer.py:197:stop] 0/2884, RunningAvgSamplesPerSec=6.327946344601117, CurrSamplesPerSec=5.723690822156731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:02,505] [INFO] [timer.py:197:stop] 0/2886, RunningAvgSamplesPerSec=6.327955051767076, CurrSamplesPerSec=5.702812738417894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:13,978] [INFO] [timer.py:197:stop] 0/2888, RunningAvgSamplesPerSec=6.327962604578867, CurrSamplesPerSec=5.709647715323768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:25,230] [INFO] [timer.py:197:stop] 0/2890, RunningAvgSamplesPerSec=6.327974743924771, CurrSamplesPerSec=5.729162837413578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:36,541] [INFO] [timer.py:197:stop] 0/2892, RunningAvgSamplesPerSec=6.327982849160652, CurrSamplesPerSec=5.6937498096324335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:47,845] [INFO] [timer.py:197:stop] 0/2894, RunningAvgSamplesPerSec=6.32799236964038, CurrSamplesPerSec=5.705542678687533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:59,138] [INFO] [timer.py:197:stop] 0/2896, RunningAvgSamplesPerSec=6.328000738339299, CurrSamplesPerSec=5.686164499770041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:10,439] [INFO] [timer.py:197:stop] 0/2898, RunningAvgSamplesPerSec=6.328011091842845, CurrSamplesPerSec=5.701617439717182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:21,757] [INFO] [logging.py:68:log_dist] [Rank 0] step=1450, skipped=5, lr=[7.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:53:21,759] [INFO] [timer.py:197:stop] 0/2900, RunningAvgSamplesPerSec=6.328008276848249, CurrSamplesPerSec=5.682868068684691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0042, 'learning_rate': 7.902222222222223e-06, 'epoch': 6.14} [2022-12-17 00:53:33,101] [INFO] [timer.py:197:stop] 0/2902, RunningAvgSamplesPerSec=6.328002365385474, CurrSamplesPerSec=5.690576310554871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:44,416] [INFO] [timer.py:197:stop] 0/2904, RunningAvgSamplesPerSec=6.328008007908925, CurrSamplesPerSec=5.688421865117699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:55,706] [INFO] [timer.py:197:stop] 0/2906, RunningAvgSamplesPerSec=6.32802756176131, CurrSamplesPerSec=5.7326287772585305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:07,008] [INFO] [timer.py:197:stop] 0/2908, RunningAvgSamplesPerSec=6.328038384867272, CurrSamplesPerSec=5.706880117177133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:18,318] [INFO] [timer.py:197:stop] 0/2910, RunningAvgSamplesPerSec=6.32804560058214, CurrSamplesPerSec=5.695414015063271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:29,633] [INFO] [timer.py:197:stop] 0/2912, RunningAvgSamplesPerSec=6.328050628624341, CurrSamplesPerSec=5.713400637270775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:40,963] [INFO] [timer.py:197:stop] 0/2914, RunningAvgSamplesPerSec=6.328050203289852, CurrSamplesPerSec=5.678421350142625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:52,263] [INFO] [timer.py:197:stop] 0/2916, RunningAvgSamplesPerSec=6.328062633120781, CurrSamplesPerSec=5.725012607653451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:03,579] [INFO] [timer.py:197:stop] 0/2918, RunningAvgSamplesPerSec=6.328068609223025, CurrSamplesPerSec=5.691808017180035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:14,905] [INFO] [logging.py:68:log_dist] [Rank 0] step=1460, skipped=5, lr=[7.88e-06], mom=[[0.9, 0.999]] [2022-12-17 00:55:14,907] [INFO] [timer.py:197:stop] 0/2920, RunningAvgSamplesPerSec=6.328069419533752, CurrSamplesPerSec=5.68772930662516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:26,187] [INFO] [timer.py:197:stop] 0/2922, RunningAvgSamplesPerSec=6.328090564627229, CurrSamplesPerSec=5.727310949560448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:37,505] [INFO] [timer.py:197:stop] 0/2924, RunningAvgSamplesPerSec=6.328096957132478, CurrSamplesPerSec=5.695935849996165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:48,808] [INFO] [timer.py:197:stop] 0/2926, RunningAvgSamplesPerSec=6.328107082929748, CurrSamplesPerSec=5.70548034649371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:00,138] [INFO] [timer.py:197:stop] 0/2928, RunningAvgSamplesPerSec=6.328098667754056, CurrSamplesPerSec=5.683542596569245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:11,544] [INFO] [timer.py:197:stop] 0/2930, RunningAvgSamplesPerSec=6.328100464276708, CurrSamplesPerSec=5.68885778373489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:22,880] [INFO] [timer.py:197:stop] 0/2932, RunningAvgSamplesPerSec=6.328096690063188, CurrSamplesPerSec=5.688263475398798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:34,206] [INFO] [timer.py:197:stop] 0/2934, RunningAvgSamplesPerSec=6.328093117601106, CurrSamplesPerSec=5.680745662337398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:45,527] [INFO] [timer.py:197:stop] 0/2936, RunningAvgSamplesPerSec=6.328096112538995, CurrSamplesPerSec=5.6893685300193875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:56,846] [INFO] [timer.py:197:stop] 0/2938, RunningAvgSamplesPerSec=6.328101176150009, CurrSamplesPerSec=5.711024745044085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:08,127] [INFO] [logging.py:68:log_dist] [Rank 0] step=1470, skipped=5, lr=[7.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:57:08,129] [INFO] [timer.py:197:stop] 0/2940, RunningAvgSamplesPerSec=6.3281204012544725, CurrSamplesPerSec=5.70905342657683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:19,476] [INFO] [timer.py:197:stop] 0/2942, RunningAvgSamplesPerSec=6.328114388296346, CurrSamplesPerSec=5.67222967619481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:30,750] [INFO] [timer.py:197:stop] 0/2944, RunningAvgSamplesPerSec=6.3281320847324265, CurrSamplesPerSec=5.725135197931068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:42,101] [INFO] [timer.py:197:stop] 0/2946, RunningAvgSamplesPerSec=6.328123390598462, CurrSamplesPerSec=5.693491616380447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:53,409] [INFO] [timer.py:197:stop] 0/2948, RunningAvgSamplesPerSec=6.328131816672485, CurrSamplesPerSec=5.72276832667421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:04,719] [INFO] [timer.py:197:stop] 0/2950, RunningAvgSamplesPerSec=6.328138734874885, CurrSamplesPerSec=5.704884257576782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0036, 'learning_rate': 7.846666666666667e-06, 'epoch': 6.25} [2022-12-17 00:58:16,022] [INFO] [timer.py:197:stop] 0/2952, RunningAvgSamplesPerSec=6.328143859408265, CurrSamplesPerSec=5.697003020922854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:27,330] [INFO] [timer.py:197:stop] 0/2954, RunningAvgSamplesPerSec=6.328152288098311, CurrSamplesPerSec=5.707560357664528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:38,599] [INFO] [timer.py:197:stop] 0/2956, RunningAvgSamplesPerSec=6.3281775886127, CurrSamplesPerSec=5.746036882029965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:49,882] [INFO] [timer.py:197:stop] 0/2958, RunningAvgSamplesPerSec=6.328195107651893, CurrSamplesPerSec=5.729716802821961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:01,199] [INFO] [logging.py:68:log_dist] [Rank 0] step=1480, skipped=5, lr=[7.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 00:59:01,200] [INFO] [timer.py:197:stop] 0/2960, RunningAvgSamplesPerSec=6.328192492674206, CurrSamplesPerSec=5.675167243100677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:12,524] [INFO] [timer.py:197:stop] 0/2962, RunningAvgSamplesPerSec=6.328193774002038, CurrSamplesPerSec=5.702767911729754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:23,848] [INFO] [timer.py:197:stop] 0/2964, RunningAvgSamplesPerSec=6.328196359985549, CurrSamplesPerSec=5.69108712416596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:35,223] [INFO] [timer.py:197:stop] 0/2966, RunningAvgSamplesPerSec=6.328176776650489, CurrSamplesPerSec=5.6551737562799635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:46,518] [INFO] [timer.py:197:stop] 0/2968, RunningAvgSamplesPerSec=6.328183787121418, CurrSamplesPerSec=5.708379142483334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:57,870] [INFO] [timer.py:197:stop] 0/2970, RunningAvgSamplesPerSec=6.328175802544738, CurrSamplesPerSec=5.6941065847717365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:09,190] [INFO] [timer.py:197:stop] 0/2972, RunningAvgSamplesPerSec=6.328178575983986, CurrSamplesPerSec=5.687292114979654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:20,444] [INFO] [timer.py:197:stop] 0/2974, RunningAvgSamplesPerSec=6.32819556655408, CurrSamplesPerSec=5.728663748848603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:31,777] [INFO] [timer.py:197:stop] 0/2976, RunningAvgSamplesPerSec=6.328196985255409, CurrSamplesPerSec=5.705050849219822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:43,090] [INFO] [timer.py:197:stop] 0/2978, RunningAvgSamplesPerSec=6.328196506480381, CurrSamplesPerSec=5.679157781284493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:54,453] [INFO] [logging.py:68:log_dist] [Rank 0] step=1490, skipped=5, lr=[7.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 01:00:54,455] [INFO] [timer.py:197:stop] 0/2980, RunningAvgSamplesPerSec=6.328180755791063, CurrSamplesPerSec=5.674602902137952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:05,768] [INFO] [timer.py:197:stop] 0/2982, RunningAvgSamplesPerSec=6.328186646636752, CurrSamplesPerSec=5.689070221829174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:17,073] [INFO] [timer.py:197:stop] 0/2984, RunningAvgSamplesPerSec=6.328195956601233, CurrSamplesPerSec=5.706514703516415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:28,369] [INFO] [timer.py:197:stop] 0/2986, RunningAvgSamplesPerSec=6.328196240513767, CurrSamplesPerSec=5.6864718992358085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:39,680] [INFO] [timer.py:197:stop] 0/2988, RunningAvgSamplesPerSec=6.328195471669826, CurrSamplesPerSec=5.705590702084778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:50,966] [INFO] [timer.py:197:stop] 0/2990, RunningAvgSamplesPerSec=6.328212465188259, CurrSamplesPerSec=5.726522885359659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:02,300] [INFO] [timer.py:197:stop] 0/2992, RunningAvgSamplesPerSec=6.328211943571726, CurrSamplesPerSec=5.682009919512811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:13,628] [INFO] [timer.py:197:stop] 0/2994, RunningAvgSamplesPerSec=6.328213010624356, CurrSamplesPerSec=5.679872770727517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:24,994] [INFO] [timer.py:197:stop] 0/2996, RunningAvgSamplesPerSec=6.328196403619549, CurrSamplesPerSec=5.635151715865987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:36,321] [INFO] [timer.py:197:stop] 0/2998, RunningAvgSamplesPerSec=6.328198002255571, CurrSamplesPerSec=5.68202483326497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:47,639] [INFO] [logging.py:68:log_dist] [Rank 0] step=1500, skipped=5, lr=[7.791111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 01:02:47,641] [INFO] [timer.py:197:stop] 0/3000, RunningAvgSamplesPerSec=6.328200625826034, CurrSamplesPerSec=5.707326150252773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.004, 'learning_rate': 7.791111111111111e-06, 'epoch': 6.36} [2022-12-17 01:02:59,034] [INFO] [timer.py:197:stop] 0/3002, RunningAvgSamplesPerSec=6.328199386225926, CurrSamplesPerSec=5.695768339988552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:10,433] [INFO] [timer.py:197:stop] 0/3004, RunningAvgSamplesPerSec=6.32818221561934, CurrSamplesPerSec=5.649236891850217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:21,836] [INFO] [timer.py:197:stop] 0/3006, RunningAvgSamplesPerSec=6.328171893389803, CurrSamplesPerSec=5.648525790615511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:33,230] [INFO] [timer.py:197:stop] 0/3008, RunningAvgSamplesPerSec=6.32817485497259, CurrSamplesPerSec=5.675626332797857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:44,527] [INFO] [timer.py:197:stop] 0/3010, RunningAvgSamplesPerSec=6.328192386500218, CurrSamplesPerSec=5.713108315122546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:55,891] [INFO] [timer.py:197:stop] 0/3012, RunningAvgSamplesPerSec=6.32818153197785, CurrSamplesPerSec=5.6502407265569765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:07,186] [INFO] [timer.py:197:stop] 0/3014, RunningAvgSamplesPerSec=6.32820043048318, CurrSamplesPerSec=5.70614302926414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:18,568] [INFO] [timer.py:197:stop] 0/3016, RunningAvgSamplesPerSec=6.328197524571121, CurrSamplesPerSec=5.681629164259772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:29,905] [INFO] [timer.py:197:stop] 0/3018, RunningAvgSamplesPerSec=6.328206598740154, CurrSamplesPerSec=5.702575286144794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:41,344] [INFO] [logging.py:68:log_dist] [Rank 0] step=1510, skipped=5, lr=[7.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 01:04:41,346] [INFO] [timer.py:197:stop] 0/3020, RunningAvgSamplesPerSec=6.328211056766832, CurrSamplesPerSec=5.692277528461339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:52,754] [INFO] [timer.py:197:stop] 0/3022, RunningAvgSamplesPerSec=6.3281936764155375, CurrSamplesPerSec=5.664522732458062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:04,136] [INFO] [timer.py:197:stop] 0/3024, RunningAvgSamplesPerSec=6.328204971771824, CurrSamplesPerSec=5.694208770316031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:15,553] [INFO] [timer.py:197:stop] 0/3026, RunningAvgSamplesPerSec=6.328206809337348, CurrSamplesPerSec=5.689582694558557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:26,858] [INFO] [timer.py:197:stop] 0/3028, RunningAvgSamplesPerSec=6.328218086355139, CurrSamplesPerSec=5.7045443147146075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:38,271] [INFO] [timer.py:197:stop] 0/3030, RunningAvgSamplesPerSec=6.328196316180822, CurrSamplesPerSec=5.621920650997765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:49,639] [INFO] [timer.py:197:stop] 0/3032, RunningAvgSamplesPerSec=6.3281938507722595, CurrSamplesPerSec=5.68351371589056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:00,994] [INFO] [timer.py:197:stop] 0/3034, RunningAvgSamplesPerSec=6.328193529434247, CurrSamplesPerSec=5.692921210664173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:12,324] [INFO] [timer.py:197:stop] 0/3036, RunningAvgSamplesPerSec=6.32819211447804, CurrSamplesPerSec=5.683399640079139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:23,619] [INFO] [timer.py:197:stop] 0/3038, RunningAvgSamplesPerSec=6.32820489915555, CurrSamplesPerSec=5.697037117024608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:34,993] [INFO] [logging.py:68:log_dist] [Rank 0] step=1520, skipped=5, lr=[7.746666666666666e-06], mom=[[0.9, 0.999]] [2022-12-17 01:06:34,995] [INFO] [timer.py:197:stop] 0/3040, RunningAvgSamplesPerSec=6.328199343076803, CurrSamplesPerSec=5.664262163134862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:46,295] [INFO] [timer.py:197:stop] 0/3042, RunningAvgSamplesPerSec=6.328224462098267, CurrSamplesPerSec=5.734026714300936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:57,577] [INFO] [timer.py:197:stop] 0/3044, RunningAvgSamplesPerSec=6.328244394666527, CurrSamplesPerSec=5.7183171979788465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:09,060] [INFO] [timer.py:197:stop] 0/3046, RunningAvgSamplesPerSec=6.3282673969346055, CurrSamplesPerSec=5.719744725116697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:20,368] [INFO] [timer.py:197:stop] 0/3048, RunningAvgSamplesPerSec=6.328277404458231, CurrSamplesPerSec=5.707639482854105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:31,625] [INFO] [timer.py:197:stop] 0/3050, RunningAvgSamplesPerSec=6.32830059436814, CurrSamplesPerSec=5.704340174492353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0043, 'learning_rate': 7.735555555555557e-06, 'epoch': 6.46} [2022-12-17 01:07:42,903] [INFO] [timer.py:197:stop] 0/3052, RunningAvgSamplesPerSec=6.328322627436005, CurrSamplesPerSec=5.717866522321677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:54,263] [INFO] [timer.py:197:stop] 0/3054, RunningAvgSamplesPerSec=6.328310776045788, CurrSamplesPerSec=5.644038407696543, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:05,606] [INFO] [timer.py:197:stop] 0/3056, RunningAvgSamplesPerSec=6.328307480005151, CurrSamplesPerSec=5.682056585385183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:16,946] [INFO] [timer.py:197:stop] 0/3058, RunningAvgSamplesPerSec=6.328328640712785, CurrSamplesPerSec=5.737794838599024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:28,584] [INFO] [logging.py:68:log_dist] [Rank 0] step=1530, skipped=5, lr=[7.724444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 01:08:28,586] [INFO] [timer.py:197:stop] 0/3060, RunningAvgSamplesPerSec=6.328350255214509, CurrSamplesPerSec=5.725933139832026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:39,910] [INFO] [timer.py:197:stop] 0/3062, RunningAvgSamplesPerSec=6.328353663895335, CurrSamplesPerSec=5.690856679653279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:51,210] [INFO] [timer.py:197:stop] 0/3064, RunningAvgSamplesPerSec=6.3283712330026605, CurrSamplesPerSec=5.732193714185728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:02,696] [INFO] [timer.py:197:stop] 0/3066, RunningAvgSamplesPerSec=6.328348230192032, CurrSamplesPerSec=5.670931429706914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:14,003] [INFO] [timer.py:197:stop] 0/3068, RunningAvgSamplesPerSec=6.328349799635841, CurrSamplesPerSec=5.696333514216841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:25,298] [INFO] [timer.py:197:stop] 0/3070, RunningAvgSamplesPerSec=6.328363329240027, CurrSamplesPerSec=5.728342480612314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:36,650] [INFO] [timer.py:197:stop] 0/3072, RunningAvgSamplesPerSec=6.328353736271112, CurrSamplesPerSec=5.70964358620523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:47,990] [INFO] [timer.py:197:stop] 0/3074, RunningAvgSamplesPerSec=6.328349460626023, CurrSamplesPerSec=5.692681442553798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:59,296] [INFO] [timer.py:197:stop] 0/3076, RunningAvgSamplesPerSec=6.328346360291402, CurrSamplesPerSec=5.692624944247637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:10,941] [INFO] [timer.py:197:stop] 0/3078, RunningAvgSamplesPerSec=6.328340149333154, CurrSamplesPerSec=5.694005610697106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:22,226] [INFO] [logging.py:68:log_dist] [Rank 0] step=1540, skipped=5, lr=[7.702222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 01:10:22,228] [INFO] [timer.py:197:stop] 0/3080, RunningAvgSamplesPerSec=6.328350389748184, CurrSamplesPerSec=5.702016866649183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:33,719] [INFO] [timer.py:197:stop] 0/3082, RunningAvgSamplesPerSec=6.328292988477743, CurrSamplesPerSec=5.544842368600772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:45,263] [INFO] [timer.py:197:stop] 0/3084, RunningAvgSamplesPerSec=6.32829325268185, CurrSamplesPerSec=5.685042388422377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:56,821] [INFO] [timer.py:197:stop] 0/3086, RunningAvgSamplesPerSec=6.3283058528128615, CurrSamplesPerSec=5.722950849799248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:08,163] [INFO] [timer.py:197:stop] 0/3088, RunningAvgSamplesPerSec=6.32829141345936, CurrSamplesPerSec=5.654485696188213, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:19,396] [INFO] [timer.py:197:stop] 0/3090, RunningAvgSamplesPerSec=6.328328365422752, CurrSamplesPerSec=5.7478484709252315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:30,683] [INFO] [timer.py:197:stop] 0/3092, RunningAvgSamplesPerSec=6.328336061377952, CurrSamplesPerSec=5.695775349589328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:42,330] [INFO] [timer.py:197:stop] 0/3094, RunningAvgSamplesPerSec=6.3282051707003575, CurrSamplesPerSec=5.388241062793341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:53,643] [INFO] [timer.py:197:stop] 0/3096, RunningAvgSamplesPerSec=6.328209741459312, CurrSamplesPerSec=5.7090888812825025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:04,963] [INFO] [timer.py:197:stop] 0/3098, RunningAvgSamplesPerSec=6.328211787634497, CurrSamplesPerSec=5.703349986587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:16,613] [INFO] [logging.py:68:log_dist] [Rank 0] step=1550, skipped=5, lr=[7.680000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 01:12:16,615] [INFO] [timer.py:197:stop] 0/3100, RunningAvgSamplesPerSec=6.328228541001663, CurrSamplesPerSec=5.715261548115209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0049, 'learning_rate': 7.680000000000001e-06, 'epoch': 6.57} [2022-12-17 01:12:27,913] [INFO] [timer.py:197:stop] 0/3102, RunningAvgSamplesPerSec=6.328239308792876, CurrSamplesPerSec=5.701619135165028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:39,435] [INFO] [timer.py:197:stop] 0/3104, RunningAvgSamplesPerSec=6.32823721816558, CurrSamplesPerSec=5.669716884049624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:51,005] [INFO] [timer.py:197:stop] 0/3106, RunningAvgSamplesPerSec=6.328228348857222, CurrSamplesPerSec=5.703074928081486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:02,338] [INFO] [timer.py:197:stop] 0/3108, RunningAvgSamplesPerSec=6.328225401121336, CurrSamplesPerSec=5.68624785088645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:13,623] [INFO] [timer.py:197:stop] 0/3110, RunningAvgSamplesPerSec=6.328242571516347, CurrSamplesPerSec=5.699047815744143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:25,020] [INFO] [timer.py:197:stop] 0/3112, RunningAvgSamplesPerSec=6.328233285287763, CurrSamplesPerSec=5.688660792324933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:36,334] [INFO] [timer.py:197:stop] 0/3114, RunningAvgSamplesPerSec=6.328238544947004, CurrSamplesPerSec=5.686951132873655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:47,651] [INFO] [timer.py:197:stop] 0/3116, RunningAvgSamplesPerSec=6.3282439130296275, CurrSamplesPerSec=5.702263962870334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:59,310] [INFO] [timer.py:197:stop] 0/3118, RunningAvgSamplesPerSec=6.328232184084257, CurrSamplesPerSec=5.701733701334038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:10,649] [INFO] [logging.py:68:log_dist] [Rank 0] step=1560, skipped=5, lr=[7.657777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 01:14:10,650] [INFO] [timer.py:197:stop] 0/3120, RunningAvgSamplesPerSec=6.328226877412343, CurrSamplesPerSec=5.668705160002777, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:22,001] [INFO] [timer.py:197:stop] 0/3122, RunningAvgSamplesPerSec=6.32823370378307, CurrSamplesPerSec=5.694993522906592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:33,559] [INFO] [timer.py:197:stop] 0/3124, RunningAvgSamplesPerSec=6.328224777775604, CurrSamplesPerSec=5.713356616797894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:44,886] [INFO] [timer.py:197:stop] 0/3126, RunningAvgSamplesPerSec=6.328225809771268, CurrSamplesPerSec=5.680624003958813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:56,182] [INFO] [timer.py:197:stop] 0/3128, RunningAvgSamplesPerSec=6.3282387265643685, CurrSamplesPerSec=5.708427456427208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:07,854] [INFO] [timer.py:197:stop] 0/3130, RunningAvgSamplesPerSec=6.32810360356806, CurrSamplesPerSec=5.710048025286301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:19,160] [INFO] [timer.py:197:stop] 0/3132, RunningAvgSamplesPerSec=6.328104647297075, CurrSamplesPerSec=5.685813295606887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:30,463] [INFO] [timer.py:197:stop] 0/3134, RunningAvgSamplesPerSec=6.328114366376503, CurrSamplesPerSec=5.7077627866023075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:41,961] [INFO] [timer.py:197:stop] 0/3136, RunningAvgSamplesPerSec=6.328043790125609, CurrSamplesPerSec=5.7158843930675225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:53,281] [INFO] [timer.py:197:stop] 0/3138, RunningAvgSamplesPerSec=6.328046788533697, CurrSamplesPerSec=5.686063807171815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:04,605] [INFO] [logging.py:68:log_dist] [Rank 0] step=1570, skipped=5, lr=[7.635555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 01:16:04,607] [INFO] [timer.py:197:stop] 0/3140, RunningAvgSamplesPerSec=6.328046055998555, CurrSamplesPerSec=5.676481834464272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:16,050] [INFO] [timer.py:197:stop] 0/3142, RunningAvgSamplesPerSec=6.327998340066359, CurrSamplesPerSec=5.698629690303827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:27,327] [INFO] [timer.py:197:stop] 0/3144, RunningAvgSamplesPerSec=6.328019294096658, CurrSamplesPerSec=5.7174458746110774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:38,661] [INFO] [timer.py:197:stop] 0/3146, RunningAvgSamplesPerSec=6.328018157319334, CurrSamplesPerSec=5.6957975870577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:50,065] [INFO] [timer.py:197:stop] 0/3148, RunningAvgSamplesPerSec=6.327986933018697, CurrSamplesPerSec=5.6882716719106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:01,366] [INFO] [timer.py:197:stop] 0/3150, RunningAvgSamplesPerSec=6.327998111905243, CurrSamplesPerSec=5.713494031017451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0044, 'learning_rate': 7.624444444444445e-06, 'epoch': 6.67} [2022-12-17 01:17:12,671] [INFO] [timer.py:197:stop] 0/3152, RunningAvgSamplesPerSec=6.32800782192515, CurrSamplesPerSec=5.712859062377203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:23,999] [INFO] [timer.py:197:stop] 0/3154, RunningAvgSamplesPerSec=6.328007133184659, CurrSamplesPerSec=5.704862191582309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:35,292] [INFO] [timer.py:197:stop] 0/3156, RunningAvgSamplesPerSec=6.328008343341138, CurrSamplesPerSec=5.702299091032531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:46,716] [INFO] [timer.py:197:stop] 0/3158, RunningAvgSamplesPerSec=6.327974298784042, CurrSamplesPerSec=5.598834661803304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:58,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=1580, skipped=5, lr=[7.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 01:17:58,353] [INFO] [timer.py:197:stop] 0/3160, RunningAvgSamplesPerSec=6.327970421835056, CurrSamplesPerSec=5.6944965039264135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:09,724] [INFO] [timer.py:197:stop] 0/3162, RunningAvgSamplesPerSec=6.327974121817204, CurrSamplesPerSec=5.690142058399431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:21,064] [INFO] [timer.py:197:stop] 0/3164, RunningAvgSamplesPerSec=6.327969293519854, CurrSamplesPerSec=5.670093169884984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:32,571] [INFO] [timer.py:197:stop] 0/3166, RunningAvgSamplesPerSec=6.327975653807134, CurrSamplesPerSec=5.716188928224968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:43,904] [INFO] [timer.py:197:stop] 0/3168, RunningAvgSamplesPerSec=6.32798179357277, CurrSamplesPerSec=5.698376134854511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:55,575] [INFO] [timer.py:197:stop] 0/3170, RunningAvgSamplesPerSec=6.327846338624374, CurrSamplesPerSec=5.3722874932535065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:07,004] [INFO] [timer.py:197:stop] 0/3172, RunningAvgSamplesPerSec=6.327857912207166, CurrSamplesPerSec=5.710076690410414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:18,288] [INFO] [timer.py:197:stop] 0/3174, RunningAvgSamplesPerSec=6.32786939326772, CurrSamplesPerSec=5.717158983073055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:29,608] [INFO] [timer.py:197:stop] 0/3176, RunningAvgSamplesPerSec=6.327860409614954, CurrSamplesPerSec=5.64593135025226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:40,950] [INFO] [timer.py:197:stop] 0/3178, RunningAvgSamplesPerSec=6.327863214845376, CurrSamplesPerSec=5.690820003295321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:52,413] [INFO] [logging.py:68:log_dist] [Rank 0] step=1590, skipped=5, lr=[7.5911111111111115e-06], mom=[[0.9, 0.999]] [2022-12-17 01:19:52,414] [INFO] [timer.py:197:stop] 0/3180, RunningAvgSamplesPerSec=6.327857851576632, CurrSamplesPerSec=5.690702979919093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:03,819] [INFO] [timer.py:197:stop] 0/3182, RunningAvgSamplesPerSec=6.327821539117611, CurrSamplesPerSec=5.589438273183318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:15,308] [INFO] [timer.py:197:stop] 0/3184, RunningAvgSamplesPerSec=6.3278335370182965, CurrSamplesPerSec=5.70394527009692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:26,668] [INFO] [timer.py:197:stop] 0/3186, RunningAvgSamplesPerSec=6.327832117311527, CurrSamplesPerSec=5.6870434229343845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:38,217] [INFO] [timer.py:197:stop] 0/3188, RunningAvgSamplesPerSec=6.327744950423371, CurrSamplesPerSec=5.49062223728194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:49,700] [INFO] [timer.py:197:stop] 0/3190, RunningAvgSamplesPerSec=6.327757491639702, CurrSamplesPerSec=5.707426141117586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:01,022] [INFO] [timer.py:197:stop] 0/3192, RunningAvgSamplesPerSec=6.32775339708403, CurrSamplesPerSec=5.679734325104576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:12,701] [INFO] [timer.py:197:stop] 0/3194, RunningAvgSamplesPerSec=6.327616548561185, CurrSamplesPerSec=5.348186843516585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:24,002] [INFO] [timer.py:197:stop] 0/3196, RunningAvgSamplesPerSec=6.327615046230956, CurrSamplesPerSec=5.6908863588942635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:35,314] [INFO] [timer.py:197:stop] 0/3198, RunningAvgSamplesPerSec=6.32762007083936, CurrSamplesPerSec=5.7030361554478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:46,722] [INFO] [logging.py:68:log_dist] [Rank 0] step=1600, skipped=5, lr=[7.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 01:21:46,724] [INFO] [timer.py:197:stop] 0/3200, RunningAvgSamplesPerSec=6.32757363728904, CurrSamplesPerSec=5.5781339101464695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0042, 'learning_rate': 7.56888888888889e-06, 'epoch': 6.78} [2022-12-17 01:21:57,995] [INFO] [timer.py:197:stop] 0/3202, RunningAvgSamplesPerSec=6.327589379341759, CurrSamplesPerSec=5.700374705537238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:09,310] [INFO] [timer.py:197:stop] 0/3204, RunningAvgSamplesPerSec=6.32759287258235, CurrSamplesPerSec=5.696930235574727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:20,680] [INFO] [timer.py:197:stop] 0/3206, RunningAvgSamplesPerSec=6.32757518739998, CurrSamplesPerSec=5.629936601948145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:31,975] [INFO] [timer.py:197:stop] 0/3208, RunningAvgSamplesPerSec=6.327587198267436, CurrSamplesPerSec=5.728729522817179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:43,303] [INFO] [timer.py:197:stop] 0/3210, RunningAvgSamplesPerSec=6.3275805587083065, CurrSamplesPerSec=5.692876056414973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:54,687] [INFO] [timer.py:197:stop] 0/3212, RunningAvgSamplesPerSec=6.327552061273541, CurrSamplesPerSec=5.62546384037998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:05,975] [INFO] [timer.py:197:stop] 0/3214, RunningAvgSamplesPerSec=6.327566489107612, CurrSamplesPerSec=5.727871645706151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:17,250] [INFO] [timer.py:197:stop] 0/3216, RunningAvgSamplesPerSec=6.32758572146028, CurrSamplesPerSec=5.724705666582072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:28,663] [INFO] [timer.py:197:stop] 0/3218, RunningAvgSamplesPerSec=6.327547048615964, CurrSamplesPerSec=5.602575408343329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:39,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=1610, skipped=5, lr=[7.5466666666666675e-06], mom=[[0.9, 0.999]] [2022-12-17 01:23:39,976] [INFO] [timer.py:197:stop] 0/3220, RunningAvgSamplesPerSec=6.327555916639667, CurrSamplesPerSec=5.693980971633932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:51,452] [INFO] [timer.py:197:stop] 0/3222, RunningAvgSamplesPerSec=6.327556455274264, CurrSamplesPerSec=5.676542814405839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:02,783] [INFO] [timer.py:197:stop] 0/3224, RunningAvgSamplesPerSec=6.327543073056566, CurrSamplesPerSec=5.64955790892777, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:14,310] [INFO] [timer.py:197:stop] 0/3226, RunningAvgSamplesPerSec=6.327543805243693, CurrSamplesPerSec=5.700171105707942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:25,643] [INFO] [timer.py:197:stop] 0/3228, RunningAvgSamplesPerSec=6.327543698754075, CurrSamplesPerSec=5.690341564909424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:37,337] [INFO] [timer.py:197:stop] 0/3230, RunningAvgSamplesPerSec=6.327403019420782, CurrSamplesPerSec=5.337525799770517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:48,635] [INFO] [timer.py:197:stop] 0/3232, RunningAvgSamplesPerSec=6.3274164819432155, CurrSamplesPerSec=5.710538044157391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:59,955] [INFO] [timer.py:197:stop] 0/3234, RunningAvgSamplesPerSec=6.327423771765399, CurrSamplesPerSec=5.703635493871505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:11,634] [INFO] [timer.py:197:stop] 0/3236, RunningAvgSamplesPerSec=6.327421480528153, CurrSamplesPerSec=5.694893725552477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:22,965] [INFO] [timer.py:197:stop] 0/3238, RunningAvgSamplesPerSec=6.327420600905887, CurrSamplesPerSec=5.67650440169151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:34,557] [INFO] [logging.py:68:log_dist] [Rank 0] step=1620, skipped=5, lr=[7.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 01:25:34,559] [INFO] [timer.py:197:stop] 0/3240, RunningAvgSamplesPerSec=6.327429179698878, CurrSamplesPerSec=5.699057253311078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:46,133] [INFO] [timer.py:197:stop] 0/3242, RunningAvgSamplesPerSec=6.32741687103007, CurrSamplesPerSec=5.676417974985021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:57,425] [INFO] [timer.py:197:stop] 0/3244, RunningAvgSamplesPerSec=6.327418780991531, CurrSamplesPerSec=5.702237071948988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:08,782] [INFO] [timer.py:197:stop] 0/3246, RunningAvgSamplesPerSec=6.327418209197686, CurrSamplesPerSec=5.68327979321014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:20,094] [INFO] [timer.py:197:stop] 0/3248, RunningAvgSamplesPerSec=6.327419606512401, CurrSamplesPerSec=5.695729183225739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:31,392] [INFO] [timer.py:197:stop] 0/3250, RunningAvgSamplesPerSec=6.327426160437134, CurrSamplesPerSec=5.701693978081709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0047, 'learning_rate': 7.513333333333334e-06, 'epoch': 6.89} [2022-12-17 01:26:42,901] [INFO] [timer.py:197:stop] 0/3252, RunningAvgSamplesPerSec=6.327439818024092, CurrSamplesPerSec=5.7171709160181425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:54,483] [INFO] [timer.py:197:stop] 0/3254, RunningAvgSamplesPerSec=6.327398825876986, CurrSamplesPerSec=5.6259394486595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:05,859] [INFO] [timer.py:197:stop] 0/3256, RunningAvgSamplesPerSec=6.327402350044329, CurrSamplesPerSec=5.708757421255074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:17,373] [INFO] [timer.py:197:stop] 0/3258, RunningAvgSamplesPerSec=6.327417754863015, CurrSamplesPerSec=5.714677282811018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:28,757] [INFO] [logging.py:68:log_dist] [Rank 0] step=1630, skipped=5, lr=[7.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 01:27:28,759] [INFO] [timer.py:197:stop] 0/3260, RunningAvgSamplesPerSec=6.327395600822194, CurrSamplesPerSec=5.708160647422357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:40,034] [INFO] [timer.py:197:stop] 0/3262, RunningAvgSamplesPerSec=6.327406874963428, CurrSamplesPerSec=5.70087105619365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:51,590] [INFO] [timer.py:197:stop] 0/3264, RunningAvgSamplesPerSec=6.327415745958481, CurrSamplesPerSec=5.710762066912186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:03,039] [INFO] [timer.py:197:stop] 0/3266, RunningAvgSamplesPerSec=6.327404836900336, CurrSamplesPerSec=5.713503516490646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:14,396] [INFO] [timer.py:197:stop] 0/3268, RunningAvgSamplesPerSec=6.327396207831295, CurrSamplesPerSec=5.646276457009662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:25,688] [INFO] [timer.py:197:stop] 0/3270, RunningAvgSamplesPerSec=6.3274121524731095, CurrSamplesPerSec=5.7055994336984055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:36,959] [INFO] [timer.py:197:stop] 0/3272, RunningAvgSamplesPerSec=6.327425382008135, CurrSamplesPerSec=5.689047554579456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:48,243] [INFO] [timer.py:197:stop] 0/3274, RunningAvgSamplesPerSec=6.3274379194298485, CurrSamplesPerSec=5.697619473741943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:59,539] [INFO] [timer.py:197:stop] 0/3276, RunningAvgSamplesPerSec=6.32745146020036, CurrSamplesPerSec=5.718524776283749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:10,946] [INFO] [timer.py:197:stop] 0/3278, RunningAvgSamplesPerSec=6.327423523144233, CurrSamplesPerSec=5.705472342847645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:22,224] [INFO] [logging.py:68:log_dist] [Rank 0] step=1640, skipped=5, lr=[7.48e-06], mom=[[0.9, 0.999]] [2022-12-17 01:29:22,226] [INFO] [timer.py:197:stop] 0/3280, RunningAvgSamplesPerSec=6.327442652650779, CurrSamplesPerSec=5.71626756247743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:33,726] [INFO] [timer.py:197:stop] 0/3282, RunningAvgSamplesPerSec=6.327382545022914, CurrSamplesPerSec=5.5422214418806774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:45,216] [INFO] [timer.py:197:stop] 0/3284, RunningAvgSamplesPerSec=6.327418272970632, CurrSamplesPerSec=5.757968026648921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:56,469] [INFO] [timer.py:197:stop] 0/3286, RunningAvgSamplesPerSec=6.327447503106981, CurrSamplesPerSec=5.728942503911237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:07,728] [INFO] [timer.py:197:stop] 0/3288, RunningAvgSamplesPerSec=6.327468220307309, CurrSamplesPerSec=5.715481804066884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:19,024] [INFO] [timer.py:197:stop] 0/3290, RunningAvgSamplesPerSec=6.327485129623551, CurrSamplesPerSec=5.706678721389803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:30,322] [INFO] [timer.py:197:stop] 0/3292, RunningAvgSamplesPerSec=6.327491937727378, CurrSamplesPerSec=5.6889142072886045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:41,761] [INFO] [timer.py:197:stop] 0/3294, RunningAvgSamplesPerSec=6.327486622166938, CurrSamplesPerSec=5.659724931531434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:53,050] [INFO] [timer.py:197:stop] 0/3296, RunningAvgSamplesPerSec=6.327504146250921, CurrSamplesPerSec=5.722502615290213, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:04,359] [INFO] [timer.py:197:stop] 0/3298, RunningAvgSamplesPerSec=6.32751865527712, CurrSamplesPerSec=5.730489843066372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:15,663] [INFO] [logging.py:68:log_dist] [Rank 0] step=1650, skipped=5, lr=[7.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 01:31:15,665] [INFO] [timer.py:197:stop] 0/3300, RunningAvgSamplesPerSec=6.3275283883850655, CurrSamplesPerSec=5.694366524941489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.006, 'learning_rate': 7.457777777777778e-06, 'epoch': 6.99} [2022-12-17 01:31:27,296] [INFO] [timer.py:197:stop] 0/3302, RunningAvgSamplesPerSec=6.327415691313412, CurrSamplesPerSec=5.403756153207737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:35,779] [INFO] [timer.py:197:stop] 0/3304, RunningAvgSamplesPerSec=6.328368075171492, CurrSamplesPerSec=10.19376284874849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:47,072] [INFO] [timer.py:197:stop] 0/3306, RunningAvgSamplesPerSec=6.328381889283473, CurrSamplesPerSec=5.726395837946484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:58,454] [INFO] [timer.py:197:stop] 0/3308, RunningAvgSamplesPerSec=6.3283671736582425, CurrSamplesPerSec=5.629318652175197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:10,088] [INFO] [timer.py:197:stop] 0/3310, RunningAvgSamplesPerSec=6.328373614216493, CurrSamplesPerSec=5.716848256979192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:21,660] [INFO] [timer.py:197:stop] 0/3312, RunningAvgSamplesPerSec=6.328350430073364, CurrSamplesPerSec=5.590974509773374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:32,935] [INFO] [timer.py:197:stop] 0/3314, RunningAvgSamplesPerSec=6.328370576080663, CurrSamplesPerSec=5.732091629603752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:44,636] [INFO] [timer.py:197:stop] 0/3316, RunningAvgSamplesPerSec=6.328365927196959, CurrSamplesPerSec=5.682405641553271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:56,099] [INFO] [timer.py:197:stop] 0/3318, RunningAvgSamplesPerSec=6.328378275281746, CurrSamplesPerSec=5.712058439492161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:07,760] [INFO] [logging.py:68:log_dist] [Rank 0] step=1660, skipped=5, lr=[7.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 01:33:07,762] [INFO] [timer.py:197:stop] 0/3320, RunningAvgSamplesPerSec=6.328251885037532, CurrSamplesPerSec=5.376068743066508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:19,283] [INFO] [timer.py:197:stop] 0/3322, RunningAvgSamplesPerSec=6.328261381499621, CurrSamplesPerSec=5.7035596304887495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:30,629] [INFO] [timer.py:197:stop] 0/3324, RunningAvgSamplesPerSec=6.328286561291607, CurrSamplesPerSec=5.727362761688731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:41,914] [INFO] [timer.py:197:stop] 0/3326, RunningAvgSamplesPerSec=6.328296633717355, CurrSamplesPerSec=5.687658927245805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:53,317] [INFO] [timer.py:197:stop] 0/3328, RunningAvgSamplesPerSec=6.328316702351698, CurrSamplesPerSec=5.735226819198422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:04,597] [INFO] [timer.py:197:stop] 0/3330, RunningAvgSamplesPerSec=6.328335218513787, CurrSamplesPerSec=5.72160806423622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:15,952] [INFO] [timer.py:197:stop] 0/3332, RunningAvgSamplesPerSec=6.328321934465862, CurrSamplesPerSec=5.621941844546697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:27,233] [INFO] [timer.py:197:stop] 0/3334, RunningAvgSamplesPerSec=6.328333758938271, CurrSamplesPerSec=5.710809692236099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:38,478] [INFO] [timer.py:197:stop] 0/3336, RunningAvgSamplesPerSec=6.328358943259871, CurrSamplesPerSec=5.736436743153215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:50,063] [INFO] [timer.py:197:stop] 0/3338, RunningAvgSamplesPerSec=6.328262610423019, CurrSamplesPerSec=5.446236956626006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:01,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=1670, skipped=5, lr=[7.413333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 01:35:01,356] [INFO] [timer.py:197:stop] 0/3340, RunningAvgSamplesPerSec=6.3282707354726195, CurrSamplesPerSec=5.697444367262241, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:12,660] [INFO] [timer.py:197:stop] 0/3342, RunningAvgSamplesPerSec=6.32828023530498, CurrSamplesPerSec=5.694936978697808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:24,015] [INFO] [timer.py:197:stop] 0/3344, RunningAvgSamplesPerSec=6.328290936313625, CurrSamplesPerSec=5.710620896512029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:35,346] [INFO] [timer.py:197:stop] 0/3346, RunningAvgSamplesPerSec=6.328290690563437, CurrSamplesPerSec=5.6815291132287555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:46,924] [INFO] [timer.py:197:stop] 0/3348, RunningAvgSamplesPerSec=6.328295836466862, CurrSamplesPerSec=5.713784203464906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:58,407] [INFO] [timer.py:197:stop] 0/3350, RunningAvgSamplesPerSec=6.328292604414985, CurrSamplesPerSec=5.69902773076984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 7.402222222222223e-06, 'epoch': 7.1} [2022-12-17 01:36:09,721] [INFO] [timer.py:197:stop] 0/3352, RunningAvgSamplesPerSec=6.328298902158826, CurrSamplesPerSec=5.690510203449995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:21,224] [INFO] [timer.py:197:stop] 0/3354, RunningAvgSamplesPerSec=6.328302617299473, CurrSamplesPerSec=5.688378710880185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:32,808] [INFO] [timer.py:197:stop] 0/3356, RunningAvgSamplesPerSec=6.328292203451692, CurrSamplesPerSec=5.710444018151198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:44,164] [INFO] [timer.py:197:stop] 0/3358, RunningAvgSamplesPerSec=6.328287530561861, CurrSamplesPerSec=5.666746435361096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:55,655] [INFO] [logging.py:68:log_dist] [Rank 0] step=1680, skipped=5, lr=[7.3911111111111125e-06], mom=[[0.9, 0.999]] [2022-12-17 01:36:55,657] [INFO] [timer.py:197:stop] 0/3360, RunningAvgSamplesPerSec=6.328284289899163, CurrSamplesPerSec=5.6893407959260545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:07,021] [INFO] [timer.py:197:stop] 0/3362, RunningAvgSamplesPerSec=6.3282828404144675, CurrSamplesPerSec=5.696201517383088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:18,338] [INFO] [timer.py:197:stop] 0/3364, RunningAvgSamplesPerSec=6.328281482380329, CurrSamplesPerSec=5.668719525154133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:29,622] [INFO] [timer.py:197:stop] 0/3366, RunningAvgSamplesPerSec=6.328287682718244, CurrSamplesPerSec=5.695697761521486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:40,921] [INFO] [timer.py:197:stop] 0/3368, RunningAvgSamplesPerSec=6.328288491734812, CurrSamplesPerSec=5.6962090115450135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:52,274] [INFO] [timer.py:197:stop] 0/3370, RunningAvgSamplesPerSec=6.328280101909268, CurrSamplesPerSec=5.677296288862611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:03,742] [INFO] [timer.py:197:stop] 0/3372, RunningAvgSamplesPerSec=6.328229320453348, CurrSamplesPerSec=5.558497111329943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:15,340] [INFO] [timer.py:197:stop] 0/3374, RunningAvgSamplesPerSec=6.328237197685572, CurrSamplesPerSec=5.704464547971759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:26,658] [INFO] [timer.py:197:stop] 0/3376, RunningAvgSamplesPerSec=6.328234872449634, CurrSamplesPerSec=5.693225477220795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:38,015] [INFO] [timer.py:197:stop] 0/3378, RunningAvgSamplesPerSec=6.328223671446257, CurrSamplesPerSec=5.65220305102006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:49,472] [INFO] [logging.py:68:log_dist] [Rank 0] step=1690, skipped=5, lr=[7.36888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 01:38:49,474] [INFO] [timer.py:197:stop] 0/3380, RunningAvgSamplesPerSec=6.3282343269547425, CurrSamplesPerSec=5.715569424413849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:00,780] [INFO] [timer.py:197:stop] 0/3382, RunningAvgSamplesPerSec=6.32824550333762, CurrSamplesPerSec=5.712211350120307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:12,225] [INFO] [timer.py:197:stop] 0/3384, RunningAvgSamplesPerSec=6.3282038666680345, CurrSamplesPerSec=5.56816756394146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:23,704] [INFO] [timer.py:197:stop] 0/3386, RunningAvgSamplesPerSec=6.32821226984345, CurrSamplesPerSec=5.716607199656674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:35,238] [INFO] [timer.py:197:stop] 0/3388, RunningAvgSamplesPerSec=6.328216059373954, CurrSamplesPerSec=5.69550392144332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:46,587] [INFO] [timer.py:197:stop] 0/3390, RunningAvgSamplesPerSec=6.328207698527992, CurrSamplesPerSec=5.657910737886335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:58,109] [INFO] [timer.py:197:stop] 0/3392, RunningAvgSamplesPerSec=6.328200542356812, CurrSamplesPerSec=5.686295550023471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:09,520] [INFO] [timer.py:197:stop] 0/3394, RunningAvgSamplesPerSec=6.328197617950004, CurrSamplesPerSec=5.683399158756303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:20,976] [INFO] [timer.py:197:stop] 0/3396, RunningAvgSamplesPerSec=6.328150606424908, CurrSamplesPerSec=5.551615532162709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:32,598] [INFO] [timer.py:197:stop] 0/3398, RunningAvgSamplesPerSec=6.328154679161973, CurrSamplesPerSec=5.7109042162776715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:44,064] [INFO] [logging.py:68:log_dist] [Rank 0] step=1700, skipped=5, lr=[7.346666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 01:40:44,066] [INFO] [timer.py:197:stop] 0/3400, RunningAvgSamplesPerSec=6.328143141046826, CurrSamplesPerSec=5.655962324635651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0029, 'learning_rate': 7.346666666666668e-06, 'epoch': 7.2} [2022-12-17 01:40:55,365] [INFO] [timer.py:197:stop] 0/3402, RunningAvgSamplesPerSec=6.328155188947058, CurrSamplesPerSec=5.71487048352276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:06,858] [INFO] [timer.py:197:stop] 0/3404, RunningAvgSamplesPerSec=6.328172193969021, CurrSamplesPerSec=5.71332500030223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:18,279] [INFO] [timer.py:197:stop] 0/3406, RunningAvgSamplesPerSec=6.328175729615554, CurrSamplesPerSec=5.685131486033896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:29,613] [INFO] [timer.py:197:stop] 0/3408, RunningAvgSamplesPerSec=6.32816802041746, CurrSamplesPerSec=5.6532679449227095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:41,121] [INFO] [timer.py:197:stop] 0/3410, RunningAvgSamplesPerSec=6.32818552917702, CurrSamplesPerSec=5.728407024762613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:52,371] [INFO] [timer.py:197:stop] 0/3412, RunningAvgSamplesPerSec=6.328197040621683, CurrSamplesPerSec=5.717122941010747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:03,686] [INFO] [timer.py:197:stop] 0/3414, RunningAvgSamplesPerSec=6.328196403591567, CurrSamplesPerSec=5.661743287944973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:14,985] [INFO] [timer.py:197:stop] 0/3416, RunningAvgSamplesPerSec=6.328201656772722, CurrSamplesPerSec=5.6989974825817145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:26,307] [INFO] [timer.py:197:stop] 0/3418, RunningAvgSamplesPerSec=6.328212993972193, CurrSamplesPerSec=5.722051035744155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:37,816] [INFO] [logging.py:68:log_dist] [Rank 0] step=1710, skipped=5, lr=[7.324444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 01:42:37,817] [INFO] [timer.py:197:stop] 0/3420, RunningAvgSamplesPerSec=6.328228305425794, CurrSamplesPerSec=5.724880743494684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:49,145] [INFO] [timer.py:197:stop] 0/3422, RunningAvgSamplesPerSec=6.32823030158857, CurrSamplesPerSec=5.703642280470542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:00,666] [INFO] [timer.py:197:stop] 0/3424, RunningAvgSamplesPerSec=6.328230234209142, CurrSamplesPerSec=5.680257617713291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:12,106] [INFO] [timer.py:197:stop] 0/3426, RunningAvgSamplesPerSec=6.328209177056322, CurrSamplesPerSec=5.696820698340513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:23,409] [INFO] [timer.py:197:stop] 0/3428, RunningAvgSamplesPerSec=6.3282132302632395, CurrSamplesPerSec=5.693795460828744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:34,701] [INFO] [timer.py:197:stop] 0/3430, RunningAvgSamplesPerSec=6.328227607550369, CurrSamplesPerSec=5.724152425560023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:46,016] [INFO] [timer.py:197:stop] 0/3432, RunningAvgSamplesPerSec=6.328220581200692, CurrSamplesPerSec=5.687688573141918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:57,323] [INFO] [timer.py:197:stop] 0/3434, RunningAvgSamplesPerSec=6.328222375359132, CurrSamplesPerSec=5.694503027191691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:08,631] [INFO] [timer.py:197:stop] 0/3436, RunningAvgSamplesPerSec=6.3282244903867655, CurrSamplesPerSec=5.698355328825123, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:19,966] [INFO] [timer.py:197:stop] 0/3438, RunningAvgSamplesPerSec=6.328221378368761, CurrSamplesPerSec=5.709337076552991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:31,309] [INFO] [logging.py:68:log_dist] [Rank 0] step=1720, skipped=5, lr=[7.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 01:44:31,311] [INFO] [timer.py:197:stop] 0/3440, RunningAvgSamplesPerSec=6.328215230487609, CurrSamplesPerSec=5.695270702428186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:42,613] [INFO] [timer.py:197:stop] 0/3442, RunningAvgSamplesPerSec=6.328213318697902, CurrSamplesPerSec=5.67559753251525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:54,239] [INFO] [timer.py:197:stop] 0/3444, RunningAvgSamplesPerSec=6.328210899339845, CurrSamplesPerSec=5.709816528518276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:05,842] [INFO] [timer.py:197:stop] 0/3446, RunningAvgSamplesPerSec=6.328206637922383, CurrSamplesPerSec=5.678575828674865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:17,347] [INFO] [timer.py:197:stop] 0/3448, RunningAvgSamplesPerSec=6.328141753874068, CurrSamplesPerSec=5.502676399186368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:28,891] [INFO] [timer.py:197:stop] 0/3450, RunningAvgSamplesPerSec=6.328143387054643, CurrSamplesPerSec=5.6908574035335215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0031, 'learning_rate': 7.291111111111112e-06, 'epoch': 7.31} [2022-12-17 01:45:40,214] [INFO] [timer.py:197:stop] 0/3452, RunningAvgSamplesPerSec=6.328145290413738, CurrSamplesPerSec=5.703460259781672, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:51,590] [INFO] [timer.py:197:stop] 0/3454, RunningAvgSamplesPerSec=6.328127466213744, CurrSamplesPerSec=5.6478186708970615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:02,864] [INFO] [timer.py:197:stop] 0/3456, RunningAvgSamplesPerSec=6.328141249502755, CurrSamplesPerSec=5.700699381725708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:14,228] [INFO] [timer.py:197:stop] 0/3458, RunningAvgSamplesPerSec=6.328139444500353, CurrSamplesPerSec=5.6923542991128775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:25,591] [INFO] [logging.py:68:log_dist] [Rank 0] step=1730, skipped=5, lr=[7.280000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 01:46:25,593] [INFO] [timer.py:197:stop] 0/3460, RunningAvgSamplesPerSec=6.328127248928353, CurrSamplesPerSec=5.651214461156085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:36,918] [INFO] [timer.py:197:stop] 0/3462, RunningAvgSamplesPerSec=6.328130860638707, CurrSamplesPerSec=5.695780667229012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:48,282] [INFO] [timer.py:197:stop] 0/3464, RunningAvgSamplesPerSec=6.328137597693461, CurrSamplesPerSec=5.706588461866827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:59,698] [INFO] [timer.py:197:stop] 0/3466, RunningAvgSamplesPerSec=6.328131763092281, CurrSamplesPerSec=5.690822898780078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:11,036] [INFO] [timer.py:197:stop] 0/3468, RunningAvgSamplesPerSec=6.328129722496397, CurrSamplesPerSec=5.689334284482466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:22,570] [INFO] [timer.py:197:stop] 0/3470, RunningAvgSamplesPerSec=6.3281306644154665, CurrSamplesPerSec=5.6844936568667706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:34,221] [INFO] [timer.py:197:stop] 0/3472, RunningAvgSamplesPerSec=6.328106526053606, CurrSamplesPerSec=5.656347037051428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:45,572] [INFO] [timer.py:197:stop] 0/3474, RunningAvgSamplesPerSec=6.328093162953867, CurrSamplesPerSec=5.674111114859532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:56,891] [INFO] [timer.py:197:stop] 0/3476, RunningAvgSamplesPerSec=6.328105604170656, CurrSamplesPerSec=5.713234044027637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:08,451] [INFO] [timer.py:197:stop] 0/3478, RunningAvgSamplesPerSec=6.328097448272549, CurrSamplesPerSec=5.6755171332732255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:19,872] [INFO] [logging.py:68:log_dist] [Rank 0] step=1740, skipped=5, lr=[7.257777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 01:48:19,874] [INFO] [timer.py:197:stop] 0/3480, RunningAvgSamplesPerSec=6.32806475412414, CurrSamplesPerSec=5.58979722711045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:31,197] [INFO] [timer.py:197:stop] 0/3482, RunningAvgSamplesPerSec=6.328066269637509, CurrSamplesPerSec=5.684986041644773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:42,589] [INFO] [timer.py:197:stop] 0/3484, RunningAvgSamplesPerSec=6.3280702776950415, CurrSamplesPerSec=5.7072453349538295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:53,886] [INFO] [timer.py:197:stop] 0/3486, RunningAvgSamplesPerSec=6.328081295245514, CurrSamplesPerSec=5.694302504016668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:05,188] [INFO] [timer.py:197:stop] 0/3488, RunningAvgSamplesPerSec=6.328090685078356, CurrSamplesPerSec=5.711951479849029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:16,555] [INFO] [timer.py:197:stop] 0/3490, RunningAvgSamplesPerSec=6.328082367812376, CurrSamplesPerSec=5.669608151295856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:27,952] [INFO] [timer.py:197:stop] 0/3492, RunningAvgSamplesPerSec=6.328083131471704, CurrSamplesPerSec=5.698459602429974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:39,247] [INFO] [timer.py:197:stop] 0/3494, RunningAvgSamplesPerSec=6.328094558189582, CurrSamplesPerSec=5.7052138129041605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:50,620] [INFO] [timer.py:197:stop] 0/3496, RunningAvgSamplesPerSec=6.328078205743374, CurrSamplesPerSec=5.728926609252932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:01,962] [INFO] [timer.py:197:stop] 0/3498, RunningAvgSamplesPerSec=6.328074417141918, CurrSamplesPerSec=5.706643053979031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:13,267] [INFO] [logging.py:68:log_dist] [Rank 0] step=1750, skipped=5, lr=[7.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 01:50:13,269] [INFO] [timer.py:197:stop] 0/3500, RunningAvgSamplesPerSec=6.328076788387537, CurrSamplesPerSec=5.706652274083303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.003, 'learning_rate': 7.235555555555556e-06, 'epoch': 7.42} [2022-12-17 01:50:24,905] [INFO] [timer.py:197:stop] 0/3502, RunningAvgSamplesPerSec=6.328095548443352, CurrSamplesPerSec=5.735608664701889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:36,205] [INFO] [timer.py:197:stop] 0/3504, RunningAvgSamplesPerSec=6.3281112245341005, CurrSamplesPerSec=5.726553426443668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:47,875] [INFO] [timer.py:197:stop] 0/3506, RunningAvgSamplesPerSec=6.327985124152368, CurrSamplesPerSec=5.32379925311647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:59,259] [INFO] [timer.py:197:stop] 0/3508, RunningAvgSamplesPerSec=6.327988019352231, CurrSamplesPerSec=5.704149382054898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:10,603] [INFO] [timer.py:197:stop] 0/3510, RunningAvgSamplesPerSec=6.327990555329165, CurrSamplesPerSec=5.695556609834613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:22,017] [INFO] [timer.py:197:stop] 0/3512, RunningAvgSamplesPerSec=6.327956274360677, CurrSamplesPerSec=5.582028110491425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:33,525] [INFO] [timer.py:197:stop] 0/3514, RunningAvgSamplesPerSec=6.3279682355017295, CurrSamplesPerSec=5.71344684733673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:45,006] [INFO] [timer.py:197:stop] 0/3516, RunningAvgSamplesPerSec=6.3279713819602526, CurrSamplesPerSec=5.690985291933365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:56,316] [INFO] [timer.py:197:stop] 0/3518, RunningAvgSamplesPerSec=6.327972735363504, CurrSamplesPerSec=5.682923170379605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:07,571] [INFO] [logging.py:68:log_dist] [Rank 0] step=1760, skipped=5, lr=[7.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 01:52:07,573] [INFO] [timer.py:197:stop] 0/3520, RunningAvgSamplesPerSec=6.328000632457452, CurrSamplesPerSec=5.740372033043068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:18,880] [INFO] [timer.py:197:stop] 0/3522, RunningAvgSamplesPerSec=6.328007794070124, CurrSamplesPerSec=5.695015512624356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:30,399] [INFO] [timer.py:197:stop] 0/3524, RunningAvgSamplesPerSec=6.327934943934382, CurrSamplesPerSec=5.490017198827353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:41,685] [INFO] [timer.py:197:stop] 0/3526, RunningAvgSamplesPerSec=6.327943588080988, CurrSamplesPerSec=5.716432871876821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:52,954] [INFO] [timer.py:197:stop] 0/3528, RunningAvgSamplesPerSec=6.327953556698756, CurrSamplesPerSec=5.71647304423527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:04,283] [INFO] [timer.py:197:stop] 0/3530, RunningAvgSamplesPerSec=6.327952891375237, CurrSamplesPerSec=5.721589039463135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:15,575] [INFO] [timer.py:197:stop] 0/3532, RunningAvgSamplesPerSec=6.32796634313063, CurrSamplesPerSec=5.703908182437717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:26,987] [INFO] [timer.py:197:stop] 0/3534, RunningAvgSamplesPerSec=6.327938074298554, CurrSamplesPerSec=5.603109373402547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:38,263] [INFO] [timer.py:197:stop] 0/3536, RunningAvgSamplesPerSec=6.327952225669314, CurrSamplesPerSec=5.7100358791340815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:49,603] [INFO] [timer.py:197:stop] 0/3538, RunningAvgSamplesPerSec=6.327958608696362, CurrSamplesPerSec=5.687503713127321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:00,963] [INFO] [logging.py:68:log_dist] [Rank 0] step=1770, skipped=5, lr=[7.191111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 01:54:00,965] [INFO] [timer.py:197:stop] 0/3540, RunningAvgSamplesPerSec=6.327936826914226, CurrSamplesPerSec=5.608218640493311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:12,321] [INFO] [timer.py:197:stop] 0/3542, RunningAvgSamplesPerSec=6.3279281870182045, CurrSamplesPerSec=5.630785943403979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:23,603] [INFO] [timer.py:197:stop] 0/3544, RunningAvgSamplesPerSec=6.327943843855838, CurrSamplesPerSec=5.717282942013153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:34,916] [INFO] [timer.py:197:stop] 0/3546, RunningAvgSamplesPerSec=6.327943092314098, CurrSamplesPerSec=5.711298140639289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:46,198] [INFO] [timer.py:197:stop] 0/3548, RunningAvgSamplesPerSec=6.3279544527004665, CurrSamplesPerSec=5.705776012064379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:57,683] [INFO] [timer.py:197:stop] 0/3550, RunningAvgSamplesPerSec=6.327904100743331, CurrSamplesPerSec=5.527757973509886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0028, 'learning_rate': 7.180000000000001e-06, 'epoch': 7.52} [2022-12-17 01:55:08,958] [INFO] [timer.py:197:stop] 0/3552, RunningAvgSamplesPerSec=6.3279175861859605, CurrSamplesPerSec=5.7165658080201425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:20,232] [INFO] [timer.py:197:stop] 0/3554, RunningAvgSamplesPerSec=6.327936571369671, CurrSamplesPerSec=5.722424053381545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:31,823] [INFO] [timer.py:197:stop] 0/3556, RunningAvgSamplesPerSec=6.327928871169333, CurrSamplesPerSec=5.661712956586682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:43,120] [INFO] [timer.py:197:stop] 0/3558, RunningAvgSamplesPerSec=6.327938712237207, CurrSamplesPerSec=5.704607111379388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:54,456] [INFO] [logging.py:68:log_dist] [Rank 0] step=1780, skipped=5, lr=[7.1688888888888895e-06], mom=[[0.9, 0.999]] [2022-12-17 01:55:54,458] [INFO] [timer.py:197:stop] 0/3560, RunningAvgSamplesPerSec=6.327929659004718, CurrSamplesPerSec=5.653885686286007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:05,826] [INFO] [timer.py:197:stop] 0/3562, RunningAvgSamplesPerSec=6.327918576109895, CurrSamplesPerSec=5.729235959602013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:17,168] [INFO] [timer.py:197:stop] 0/3564, RunningAvgSamplesPerSec=6.327915134381827, CurrSamplesPerSec=5.672132353009539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:28,474] [INFO] [timer.py:197:stop] 0/3566, RunningAvgSamplesPerSec=6.327923942699098, CurrSamplesPerSec=5.682262982816896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:39,759] [INFO] [timer.py:197:stop] 0/3568, RunningAvgSamplesPerSec=6.327934093570251, CurrSamplesPerSec=5.710346593953746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:51,044] [INFO] [timer.py:197:stop] 0/3570, RunningAvgSamplesPerSec=6.327950174366859, CurrSamplesPerSec=5.712983564444499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:02,649] [INFO] [timer.py:197:stop] 0/3572, RunningAvgSamplesPerSec=6.327926860476011, CurrSamplesPerSec=5.694614891436664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:13,930] [INFO] [timer.py:197:stop] 0/3574, RunningAvgSamplesPerSec=6.327938623697873, CurrSamplesPerSec=5.726775042307256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:25,236] [INFO] [timer.py:197:stop] 0/3576, RunningAvgSamplesPerSec=6.327943893060957, CurrSamplesPerSec=5.705478406213814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:36,872] [INFO] [timer.py:197:stop] 0/3578, RunningAvgSamplesPerSec=6.3278364995467475, CurrSamplesPerSec=5.669009956245839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:48,211] [INFO] [logging.py:68:log_dist] [Rank 0] step=1790, skipped=5, lr=[7.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 01:57:48,213] [INFO] [timer.py:197:stop] 0/3580, RunningAvgSamplesPerSec=6.327832187742601, CurrSamplesPerSec=5.691824672039203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:59,636] [INFO] [timer.py:197:stop] 0/3582, RunningAvgSamplesPerSec=6.327800200782722, CurrSamplesPerSec=5.603391950625858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:10,952] [INFO] [timer.py:197:stop] 0/3584, RunningAvgSamplesPerSec=6.327805501869577, CurrSamplesPerSec=5.700271330203468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:22,266] [INFO] [timer.py:197:stop] 0/3586, RunningAvgSamplesPerSec=6.327811376478618, CurrSamplesPerSec=5.715144490811053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:33,529] [INFO] [timer.py:197:stop] 0/3588, RunningAvgSamplesPerSec=6.327829616725786, CurrSamplesPerSec=5.723332526685991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:44,848] [INFO] [timer.py:197:stop] 0/3590, RunningAvgSamplesPerSec=6.327828118350446, CurrSamplesPerSec=5.67359015323349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:56,165] [INFO] [timer.py:197:stop] 0/3592, RunningAvgSamplesPerSec=6.327827434121203, CurrSamplesPerSec=5.677497297199891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:07,479] [INFO] [timer.py:197:stop] 0/3594, RunningAvgSamplesPerSec=6.327833910620886, CurrSamplesPerSec=5.710358255527924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:18,805] [INFO] [timer.py:197:stop] 0/3596, RunningAvgSamplesPerSec=6.327836396013567, CurrSamplesPerSec=5.689810141446445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:30,151] [INFO] [timer.py:197:stop] 0/3598, RunningAvgSamplesPerSec=6.327831213010918, CurrSamplesPerSec=5.687007036678088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:41,493] [INFO] [logging.py:68:log_dist] [Rank 0] step=1800, skipped=5, lr=[7.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 01:59:41,495] [INFO] [timer.py:197:stop] 0/3600, RunningAvgSamplesPerSec=6.327827075130297, CurrSamplesPerSec=5.684304911920582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0039, 'learning_rate': 7.124444444444445e-06, 'epoch': 7.63} [2022-12-17 01:59:52,885] [INFO] [timer.py:197:stop] 0/3602, RunningAvgSamplesPerSec=6.327801323103608, CurrSamplesPerSec=5.704719615425174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:04,167] [INFO] [timer.py:197:stop] 0/3604, RunningAvgSamplesPerSec=6.327815179366836, CurrSamplesPerSec=5.712152032465933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:15,433] [INFO] [timer.py:197:stop] 0/3606, RunningAvgSamplesPerSec=6.32784080816605, CurrSamplesPerSec=5.72127319793468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:26,682] [INFO] [timer.py:197:stop] 0/3608, RunningAvgSamplesPerSec=6.327863358791299, CurrSamplesPerSec=5.728916583436902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:37,963] [INFO] [timer.py:197:stop] 0/3610, RunningAvgSamplesPerSec=6.327880859980065, CurrSamplesPerSec=5.719402277828227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:49,236] [INFO] [timer.py:197:stop] 0/3612, RunningAvgSamplesPerSec=6.32789408836473, CurrSamplesPerSec=5.702605087742875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:01,131] [INFO] [timer.py:197:stop] 0/3614, RunningAvgSamplesPerSec=6.327892039894871, CurrSamplesPerSec=5.668299135170656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:12,791] [INFO] [timer.py:197:stop] 0/3616, RunningAvgSamplesPerSec=6.327888579025662, CurrSamplesPerSec=5.682090502814677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:24,764] [INFO] [timer.py:197:stop] 0/3618, RunningAvgSamplesPerSec=6.327884018322994, CurrSamplesPerSec=5.668504534331354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:36,124] [INFO] [logging.py:68:log_dist] [Rank 0] step=1810, skipped=5, lr=[7.102222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 02:01:36,125] [INFO] [timer.py:197:stop] 0/3620, RunningAvgSamplesPerSec=6.3278921259197904, CurrSamplesPerSec=5.714443464157503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:47,415] [INFO] [timer.py:197:stop] 0/3622, RunningAvgSamplesPerSec=6.3279065726852615, CurrSamplesPerSec=5.725988835540606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:58,700] [INFO] [timer.py:197:stop] 0/3624, RunningAvgSamplesPerSec=6.327922066863667, CurrSamplesPerSec=5.720485819054075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:10,061] [INFO] [timer.py:197:stop] 0/3626, RunningAvgSamplesPerSec=6.327913184941603, CurrSamplesPerSec=5.708340783251031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:21,389] [INFO] [timer.py:197:stop] 0/3628, RunningAvgSamplesPerSec=6.327922140001181, CurrSamplesPerSec=5.695893065117951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:32,688] [INFO] [timer.py:197:stop] 0/3630, RunningAvgSamplesPerSec=6.327936577561977, CurrSamplesPerSec=5.712554395636458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:44,121] [INFO] [timer.py:197:stop] 0/3632, RunningAvgSamplesPerSec=6.327962876653724, CurrSamplesPerSec=5.729724140830995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:55,432] [INFO] [timer.py:197:stop] 0/3634, RunningAvgSamplesPerSec=6.327980036999522, CurrSamplesPerSec=5.727084404708527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:06,839] [INFO] [timer.py:197:stop] 0/3636, RunningAvgSamplesPerSec=6.327956709513156, CurrSamplesPerSec=5.602805073350317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:18,401] [INFO] [timer.py:197:stop] 0/3638, RunningAvgSamplesPerSec=6.32797987065095, CurrSamplesPerSec=5.736248700806572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:29,754] [INFO] [logging.py:68:log_dist] [Rank 0] step=1820, skipped=5, lr=[7.08e-06], mom=[[0.9, 0.999]] [2022-12-17 02:03:29,755] [INFO] [timer.py:197:stop] 0/3640, RunningAvgSamplesPerSec=6.327993811918197, CurrSamplesPerSec=5.707425413016101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:41,156] [INFO] [timer.py:197:stop] 0/3642, RunningAvgSamplesPerSec=6.327964199154306, CurrSamplesPerSec=5.58800687342309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:52,578] [INFO] [timer.py:197:stop] 0/3644, RunningAvgSamplesPerSec=6.3279903684733325, CurrSamplesPerSec=5.734441965307107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:03,840] [INFO] [timer.py:197:stop] 0/3646, RunningAvgSamplesPerSec=6.328012406815716, CurrSamplesPerSec=5.729712644625185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:15,208] [INFO] [timer.py:197:stop] 0/3648, RunningAvgSamplesPerSec=6.327998777121916, CurrSamplesPerSec=5.6463310889257095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:26,567] [INFO] [timer.py:197:stop] 0/3650, RunningAvgSamplesPerSec=6.328016363517222, CurrSamplesPerSec=5.729583498930007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0031, 'learning_rate': 7.06888888888889e-06, 'epoch': 7.73} [2022-12-17 02:04:37,835] [INFO] [timer.py:197:stop] 0/3652, RunningAvgSamplesPerSec=6.3280379957539425, CurrSamplesPerSec=5.750655218175977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:49,140] [INFO] [timer.py:197:stop] 0/3654, RunningAvgSamplesPerSec=6.328047093226549, CurrSamplesPerSec=5.692673716223513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:00,555] [INFO] [timer.py:197:stop] 0/3656, RunningAvgSamplesPerSec=6.328070328408502, CurrSamplesPerSec=5.729652962936433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:11,824] [INFO] [timer.py:197:stop] 0/3658, RunningAvgSamplesPerSec=6.328090833625429, CurrSamplesPerSec=5.722410634637678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:23,067] [INFO] [logging.py:68:log_dist] [Rank 0] step=1830, skipped=5, lr=[7.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 02:05:23,068] [INFO] [timer.py:197:stop] 0/3660, RunningAvgSamplesPerSec=6.328114198755333, CurrSamplesPerSec=5.753315006315808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:34,318] [INFO] [timer.py:197:stop] 0/3662, RunningAvgSamplesPerSec=6.328140992662758, CurrSamplesPerSec=5.763149688904019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:45,814] [INFO] [timer.py:197:stop] 0/3664, RunningAvgSamplesPerSec=6.328166430047528, CurrSamplesPerSec=5.744112367045322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:57,107] [INFO] [timer.py:197:stop] 0/3666, RunningAvgSamplesPerSec=6.328179424638803, CurrSamplesPerSec=5.704464790420485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:08,364] [INFO] [timer.py:197:stop] 0/3668, RunningAvgSamplesPerSec=6.328198479294168, CurrSamplesPerSec=5.732478199279469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:19,630] [INFO] [timer.py:197:stop] 0/3670, RunningAvgSamplesPerSec=6.3282200623513365, CurrSamplesPerSec=5.71588853121749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:30,920] [INFO] [timer.py:197:stop] 0/3672, RunningAvgSamplesPerSec=6.328233558552581, CurrSamplesPerSec=5.725239477060349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:42,190] [INFO] [timer.py:197:stop] 0/3674, RunningAvgSamplesPerSec=6.3282541228592954, CurrSamplesPerSec=5.716858727606408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:53,472] [INFO] [timer.py:197:stop] 0/3676, RunningAvgSamplesPerSec=6.328270568160537, CurrSamplesPerSec=5.739481949746398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:04,744] [INFO] [timer.py:197:stop] 0/3678, RunningAvgSamplesPerSec=6.328284690470813, CurrSamplesPerSec=5.716911568333583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:16,014] [INFO] [logging.py:68:log_dist] [Rank 0] step=1840, skipped=5, lr=[7.035555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 02:07:16,016] [INFO] [timer.py:197:stop] 0/3680, RunningAvgSamplesPerSec=6.328304220828954, CurrSamplesPerSec=5.7337428103257135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:27,284] [INFO] [timer.py:197:stop] 0/3682, RunningAvgSamplesPerSec=6.328324947917273, CurrSamplesPerSec=5.739207076337193, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:38,555] [INFO] [timer.py:197:stop] 0/3684, RunningAvgSamplesPerSec=6.3283337295802875, CurrSamplesPerSec=5.710208845213026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:49,871] [INFO] [timer.py:197:stop] 0/3686, RunningAvgSamplesPerSec=6.328337637182703, CurrSamplesPerSec=5.692891027263141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:01,209] [INFO] [timer.py:197:stop] 0/3688, RunningAvgSamplesPerSec=6.328334295974981, CurrSamplesPerSec=5.6979128737779705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:12,502] [INFO] [timer.py:197:stop] 0/3690, RunningAvgSamplesPerSec=6.328341546989031, CurrSamplesPerSec=5.702536278019622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:23,789] [INFO] [timer.py:197:stop] 0/3692, RunningAvgSamplesPerSec=6.328349056114674, CurrSamplesPerSec=5.704499945703904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:35,047] [INFO] [timer.py:197:stop] 0/3694, RunningAvgSamplesPerSec=6.32836730817153, CurrSamplesPerSec=5.725636847106058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:46,582] [INFO] [timer.py:197:stop] 0/3696, RunningAvgSamplesPerSec=6.328376371606257, CurrSamplesPerSec=5.706006938727323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:57,904] [INFO] [timer.py:197:stop] 0/3698, RunningAvgSamplesPerSec=6.328378093780299, CurrSamplesPerSec=5.7031231522802575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:09,196] [INFO] [logging.py:68:log_dist] [Rank 0] step=1850, skipped=5, lr=[7.0133333333333345e-06], mom=[[0.9, 0.999]] [2022-12-17 02:09:09,198] [INFO] [timer.py:197:stop] 0/3700, RunningAvgSamplesPerSec=6.328390288445043, CurrSamplesPerSec=5.716338164733072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0027, 'learning_rate': 7.0133333333333345e-06, 'epoch': 7.84} [2022-12-17 02:09:20,481] [INFO] [timer.py:197:stop] 0/3702, RunningAvgSamplesPerSec=6.328405384962958, CurrSamplesPerSec=5.730546116797611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:31,754] [INFO] [timer.py:197:stop] 0/3704, RunningAvgSamplesPerSec=6.328418593830659, CurrSamplesPerSec=5.720126705806713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:43,071] [INFO] [timer.py:197:stop] 0/3706, RunningAvgSamplesPerSec=6.3284230163254644, CurrSamplesPerSec=5.692320500457679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:54,375] [INFO] [timer.py:197:stop] 0/3708, RunningAvgSamplesPerSec=6.32843249908911, CurrSamplesPerSec=5.7056054973347035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:05,694] [INFO] [timer.py:197:stop] 0/3710, RunningAvgSamplesPerSec=6.328436656248839, CurrSamplesPerSec=5.7148286303710005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:16,980] [INFO] [timer.py:197:stop] 0/3712, RunningAvgSamplesPerSec=6.328446140698633, CurrSamplesPerSec=5.71491842073463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:28,279] [INFO] [timer.py:197:stop] 0/3714, RunningAvgSamplesPerSec=6.328459971048828, CurrSamplesPerSec=5.726909192988373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:39,581] [INFO] [timer.py:197:stop] 0/3716, RunningAvgSamplesPerSec=6.328468354521305, CurrSamplesPerSec=5.703816313777253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:50,907] [INFO] [timer.py:197:stop] 0/3718, RunningAvgSamplesPerSec=6.328463095854033, CurrSamplesPerSec=5.686705361943823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:02,197] [INFO] [logging.py:68:log_dist] [Rank 0] step=1860, skipped=5, lr=[6.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 02:11:02,198] [INFO] [timer.py:197:stop] 0/3720, RunningAvgSamplesPerSec=6.328475951506111, CurrSamplesPerSec=5.72845078847984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:13,481] [INFO] [timer.py:197:stop] 0/3722, RunningAvgSamplesPerSec=6.328491494415783, CurrSamplesPerSec=5.727654589014197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:24,776] [INFO] [timer.py:197:stop] 0/3724, RunningAvgSamplesPerSec=6.328503029556852, CurrSamplesPerSec=5.7198580707958415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:36,078] [INFO] [timer.py:197:stop] 0/3726, RunningAvgSamplesPerSec=6.328512845945567, CurrSamplesPerSec=5.716436036951238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:47,389] [INFO] [timer.py:197:stop] 0/3728, RunningAvgSamplesPerSec=6.328519648234641, CurrSamplesPerSec=5.696660631635473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:58,710] [INFO] [timer.py:197:stop] 0/3730, RunningAvgSamplesPerSec=6.3285224714645985, CurrSamplesPerSec=5.706504755990071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:10,021] [INFO] [timer.py:197:stop] 0/3732, RunningAvgSamplesPerSec=6.328523854481922, CurrSamplesPerSec=5.680343440173591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:21,373] [INFO] [timer.py:197:stop] 0/3734, RunningAvgSamplesPerSec=6.328516952220104, CurrSamplesPerSec=5.674356038258372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:32,655] [INFO] [timer.py:197:stop] 0/3736, RunningAvgSamplesPerSec=6.328527993386, CurrSamplesPerSec=5.715351595458335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:43,985] [INFO] [timer.py:197:stop] 0/3738, RunningAvgSamplesPerSec=6.328523357191348, CurrSamplesPerSec=5.673340020114448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:55,366] [INFO] [logging.py:68:log_dist] [Rank 0] step=1870, skipped=5, lr=[6.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 02:12:55,367] [INFO] [timer.py:197:stop] 0/3740, RunningAvgSamplesPerSec=6.328530895199716, CurrSamplesPerSec=5.704396420696285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:06,720] [INFO] [timer.py:197:stop] 0/3742, RunningAvgSamplesPerSec=6.32852430551046, CurrSamplesPerSec=5.666163435402976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:18,043] [INFO] [timer.py:197:stop] 0/3744, RunningAvgSamplesPerSec=6.328526981645077, CurrSamplesPerSec=5.691222021288974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:29,477] [INFO] [timer.py:197:stop] 0/3746, RunningAvgSamplesPerSec=6.32854021618564, CurrSamplesPerSec=5.715457222174339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:40,814] [INFO] [timer.py:197:stop] 0/3748, RunningAvgSamplesPerSec=6.32853715357103, CurrSamplesPerSec=5.673013417356105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:52,112] [INFO] [timer.py:197:stop] 0/3750, RunningAvgSamplesPerSec=6.3285428126775445, CurrSamplesPerSec=5.69774863376101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 6.9577777777777785e-06, 'epoch': 7.94} [2022-12-17 02:14:03,395] [INFO] [timer.py:197:stop] 0/3752, RunningAvgSamplesPerSec=6.32855860428077, CurrSamplesPerSec=5.7144400579917765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:14,669] [INFO] [timer.py:197:stop] 0/3754, RunningAvgSamplesPerSec=6.32857254990377, CurrSamplesPerSec=5.707274214632692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:25,976] [INFO] [timer.py:197:stop] 0/3756, RunningAvgSamplesPerSec=6.32858012388723, CurrSamplesPerSec=5.7056343604201665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:37,275] [INFO] [timer.py:197:stop] 0/3758, RunningAvgSamplesPerSec=6.328582928084006, CurrSamplesPerSec=5.690336016171647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:48,581] [INFO] [logging.py:68:log_dist] [Rank 0] step=1880, skipped=5, lr=[6.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 02:14:48,583] [INFO] [timer.py:197:stop] 0/3760, RunningAvgSamplesPerSec=6.328590303890887, CurrSamplesPerSec=5.710872140930814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:59,882] [INFO] [timer.py:197:stop] 0/3762, RunningAvgSamplesPerSec=6.328600848606018, CurrSamplesPerSec=5.727032841987243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:11,192] [INFO] [timer.py:197:stop] 0/3764, RunningAvgSamplesPerSec=6.328607403088262, CurrSamplesPerSec=5.707744824701184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:22,491] [INFO] [timer.py:197:stop] 0/3766, RunningAvgSamplesPerSec=6.328617861925704, CurrSamplesPerSec=5.7186139518732695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:33,866] [INFO] [timer.py:197:stop] 0/3768, RunningAvgSamplesPerSec=6.328603286767437, CurrSamplesPerSec=5.653605852250859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:45,131] [INFO] [timer.py:197:stop] 0/3770, RunningAvgSamplesPerSec=6.3286202165529755, CurrSamplesPerSec=5.713061624172338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:56,520] [INFO] [timer.py:197:stop] 0/3772, RunningAvgSamplesPerSec=6.328601493207364, CurrSamplesPerSec=5.618634523949762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:07,950] [INFO] [timer.py:197:stop] 0/3774, RunningAvgSamplesPerSec=6.328591767717235, CurrSamplesPerSec=5.6332066490927115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:16,490] [INFO] [timer.py:197:stop] 0/3776, RunningAvgSamplesPerSec=6.329405434285961, CurrSamplesPerSec=10.041965532176695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:27,738] [INFO] [timer.py:197:stop] 0/3778, RunningAvgSamplesPerSec=6.329430791527427, CurrSamplesPerSec=5.735725581354602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:39,053] [INFO] [logging.py:68:log_dist] [Rank 0] step=1890, skipped=5, lr=[6.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 02:16:39,055] [INFO] [timer.py:197:stop] 0/3780, RunningAvgSamplesPerSec=6.329434823557692, CurrSamplesPerSec=5.699065722949043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:50,358] [INFO] [timer.py:197:stop] 0/3782, RunningAvgSamplesPerSec=6.329442138280489, CurrSamplesPerSec=5.71940057178781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:01,661] [INFO] [timer.py:197:stop] 0/3784, RunningAvgSamplesPerSec=6.329449934445086, CurrSamplesPerSec=5.722445279523075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:12,916] [INFO] [timer.py:197:stop] 0/3786, RunningAvgSamplesPerSec=6.329473072416305, CurrSamplesPerSec=5.758911295903981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:24,186] [INFO] [timer.py:197:stop] 0/3788, RunningAvgSamplesPerSec=6.329486457479711, CurrSamplesPerSec=5.717976383190484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:35,494] [INFO] [timer.py:197:stop] 0/3790, RunningAvgSamplesPerSec=6.329492173183151, CurrSamplesPerSec=5.6981175218636375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:46,888] [INFO] [timer.py:197:stop] 0/3792, RunningAvgSamplesPerSec=6.329469265360669, CurrSamplesPerSec=5.614493217525514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:58,160] [INFO] [timer.py:197:stop] 0/3794, RunningAvgSamplesPerSec=6.3294846704318015, CurrSamplesPerSec=5.712998884409758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:09,443] [INFO] [timer.py:197:stop] 0/3796, RunningAvgSamplesPerSec=6.32949887820273, CurrSamplesPerSec=5.709194033786988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:20,683] [INFO] [timer.py:197:stop] 0/3798, RunningAvgSamplesPerSec=6.329520651840991, CurrSamplesPerSec=5.732598171337711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:32,056] [INFO] [logging.py:68:log_dist] [Rank 0] step=1900, skipped=5, lr=[6.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 02:18:32,058] [INFO] [timer.py:197:stop] 0/3800, RunningAvgSamplesPerSec=6.329503954260459, CurrSamplesPerSec=5.640483877827287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0033, 'learning_rate': 6.902222222222223e-06, 'epoch': 8.05} [2022-12-17 02:18:43,327] [INFO] [timer.py:197:stop] 0/3802, RunningAvgSamplesPerSec=6.32952213607384, CurrSamplesPerSec=5.727358851306722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:54,642] [INFO] [timer.py:197:stop] 0/3804, RunningAvgSamplesPerSec=6.329532860172703, CurrSamplesPerSec=5.715511497327968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:05,943] [INFO] [timer.py:197:stop] 0/3806, RunningAvgSamplesPerSec=6.329545981695896, CurrSamplesPerSec=5.713660152961128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:17,382] [INFO] [timer.py:197:stop] 0/3808, RunningAvgSamplesPerSec=6.329550794690124, CurrSamplesPerSec=5.6937915961488414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:28,637] [INFO] [timer.py:197:stop] 0/3810, RunningAvgSamplesPerSec=6.329564877775189, CurrSamplesPerSec=5.722360863846393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:39,942] [INFO] [timer.py:197:stop] 0/3812, RunningAvgSamplesPerSec=6.329571525169608, CurrSamplesPerSec=5.707313772986497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:51,252] [INFO] [timer.py:197:stop] 0/3814, RunningAvgSamplesPerSec=6.3295763320249305, CurrSamplesPerSec=5.712068406390203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:02,584] [INFO] [timer.py:197:stop] 0/3816, RunningAvgSamplesPerSec=6.329574370163669, CurrSamplesPerSec=5.693129605597974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:13,911] [INFO] [timer.py:197:stop] 0/3818, RunningAvgSamplesPerSec=6.329574490134868, CurrSamplesPerSec=5.686367822968028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:25,218] [INFO] [logging.py:68:log_dist] [Rank 0] step=1910, skipped=5, lr=[6.88e-06], mom=[[0.9, 0.999]] [2022-12-17 02:20:25,219] [INFO] [timer.py:197:stop] 0/3820, RunningAvgSamplesPerSec=6.329570251685095, CurrSamplesPerSec=5.675956596931117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:36,536] [INFO] [timer.py:197:stop] 0/3822, RunningAvgSamplesPerSec=6.3295741225492925, CurrSamplesPerSec=5.706256321703277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:47,844] [INFO] [timer.py:197:stop] 0/3824, RunningAvgSamplesPerSec=6.3295797971395995, CurrSamplesPerSec=5.717985883552623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:59,133] [INFO] [timer.py:197:stop] 0/3826, RunningAvgSamplesPerSec=6.3295918030274505, CurrSamplesPerSec=5.710409275474985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:10,404] [INFO] [timer.py:197:stop] 0/3828, RunningAvgSamplesPerSec=6.329599897041199, CurrSamplesPerSec=5.706553766089495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:21,731] [INFO] [timer.py:197:stop] 0/3830, RunningAvgSamplesPerSec=6.329599217752504, CurrSamplesPerSec=5.688717211971074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:33,027] [INFO] [timer.py:197:stop] 0/3832, RunningAvgSamplesPerSec=6.329603904750574, CurrSamplesPerSec=5.703930968262709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:44,338] [INFO] [timer.py:197:stop] 0/3834, RunningAvgSamplesPerSec=6.329608718741985, CurrSamplesPerSec=5.703741172745922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:55,646] [INFO] [timer.py:197:stop] 0/3836, RunningAvgSamplesPerSec=6.3296147038120765, CurrSamplesPerSec=5.7027053978494635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:06,956] [INFO] [timer.py:197:stop] 0/3838, RunningAvgSamplesPerSec=6.329619610197565, CurrSamplesPerSec=5.684683858968662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:18,241] [INFO] [logging.py:68:log_dist] [Rank 0] step=1920, skipped=5, lr=[6.857777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 02:22:18,242] [INFO] [timer.py:197:stop] 0/3840, RunningAvgSamplesPerSec=6.329632723924941, CurrSamplesPerSec=5.705162885752711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:29,542] [INFO] [timer.py:197:stop] 0/3842, RunningAvgSamplesPerSec=6.329637525436198, CurrSamplesPerSec=5.709632413326189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:40,818] [INFO] [timer.py:197:stop] 0/3844, RunningAvgSamplesPerSec=6.329654567588082, CurrSamplesPerSec=5.723388903993835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:52,114] [INFO] [timer.py:197:stop] 0/3846, RunningAvgSamplesPerSec=6.329664886989055, CurrSamplesPerSec=5.725275377311717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:03,448] [INFO] [timer.py:197:stop] 0/3848, RunningAvgSamplesPerSec=6.329665610995146, CurrSamplesPerSec=5.708845564011121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:14,755] [INFO] [timer.py:197:stop] 0/3850, RunningAvgSamplesPerSec=6.329672720076725, CurrSamplesPerSec=5.696633793536779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0032, 'learning_rate': 6.846666666666667e-06, 'epoch': 8.16} [2022-12-17 02:23:26,091] [INFO] [timer.py:197:stop] 0/3852, RunningAvgSamplesPerSec=6.32967063641164, CurrSamplesPerSec=5.6939845950122026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:37,356] [INFO] [timer.py:197:stop] 0/3854, RunningAvgSamplesPerSec=6.329685549843888, CurrSamplesPerSec=5.725235813794708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:48,671] [INFO] [timer.py:197:stop] 0/3856, RunningAvgSamplesPerSec=6.329690025034303, CurrSamplesPerSec=5.696226659165504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:00,016] [INFO] [timer.py:197:stop] 0/3858, RunningAvgSamplesPerSec=6.329685215773519, CurrSamplesPerSec=5.693312174994899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:11,426] [INFO] [logging.py:68:log_dist] [Rank 0] step=1930, skipped=5, lr=[6.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 02:24:11,428] [INFO] [timer.py:197:stop] 0/3860, RunningAvgSamplesPerSec=6.329690796188916, CurrSamplesPerSec=5.701824776245682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:22,723] [INFO] [timer.py:197:stop] 0/3862, RunningAvgSamplesPerSec=6.3296961670571354, CurrSamplesPerSec=5.69953982082623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:34,021] [INFO] [timer.py:197:stop] 0/3864, RunningAvgSamplesPerSec=6.329705244869385, CurrSamplesPerSec=5.705117052093761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:45,327] [INFO] [timer.py:197:stop] 0/3866, RunningAvgSamplesPerSec=6.329712194155368, CurrSamplesPerSec=5.705711976888372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:56,636] [INFO] [timer.py:197:stop] 0/3868, RunningAvgSamplesPerSec=6.329722996438025, CurrSamplesPerSec=5.696094909021571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:07,963] [INFO] [timer.py:197:stop] 0/3870, RunningAvgSamplesPerSec=6.329723610970681, CurrSamplesPerSec=5.699059915194582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:19,267] [INFO] [timer.py:197:stop] 0/3872, RunningAvgSamplesPerSec=6.329730045520958, CurrSamplesPerSec=5.72029077571516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:30,596] [INFO] [timer.py:197:stop] 0/3874, RunningAvgSamplesPerSec=6.329724069035336, CurrSamplesPerSec=5.675370020557602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:41,918] [INFO] [timer.py:197:stop] 0/3876, RunningAvgSamplesPerSec=6.329725765685623, CurrSamplesPerSec=5.6958365030451725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:53,221] [INFO] [timer.py:197:stop] 0/3878, RunningAvgSamplesPerSec=6.329733146930267, CurrSamplesPerSec=5.705232486420832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:04,538] [INFO] [logging.py:68:log_dist] [Rank 0] step=1940, skipped=5, lr=[6.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 02:26:04,539] [INFO] [timer.py:197:stop] 0/3880, RunningAvgSamplesPerSec=6.329736432320369, CurrSamplesPerSec=5.710810421201311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:15,836] [INFO] [timer.py:197:stop] 0/3882, RunningAvgSamplesPerSec=6.32974185638214, CurrSamplesPerSec=5.701551560238876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:27,147] [INFO] [timer.py:197:stop] 0/3884, RunningAvgSamplesPerSec=6.32974730470015, CurrSamplesPerSec=5.702902636224327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:38,434] [INFO] [timer.py:197:stop] 0/3886, RunningAvgSamplesPerSec=6.329757043427764, CurrSamplesPerSec=5.710765711679313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:49,740] [INFO] [timer.py:197:stop] 0/3888, RunningAvgSamplesPerSec=6.3297655812523566, CurrSamplesPerSec=5.7131039378135435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:01,009] [INFO] [timer.py:197:stop] 0/3890, RunningAvgSamplesPerSec=6.329781514470751, CurrSamplesPerSec=5.732171681234439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:12,305] [INFO] [timer.py:197:stop] 0/3892, RunningAvgSamplesPerSec=6.329792827236585, CurrSamplesPerSec=5.713371452351093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:23,746] [INFO] [timer.py:197:stop] 0/3894, RunningAvgSamplesPerSec=6.329804099269943, CurrSamplesPerSec=5.718544024331364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:35,062] [INFO] [timer.py:197:stop] 0/3896, RunningAvgSamplesPerSec=6.32980797636874, CurrSamplesPerSec=5.699682864311109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:46,340] [INFO] [timer.py:197:stop] 0/3898, RunningAvgSamplesPerSec=6.329824368151498, CurrSamplesPerSec=5.735463077042991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:57,678] [INFO] [logging.py:68:log_dist] [Rank 0] step=1950, skipped=5, lr=[6.7911111111111115e-06], mom=[[0.9, 0.999]] [2022-12-17 02:27:57,679] [INFO] [timer.py:197:stop] 0/3900, RunningAvgSamplesPerSec=6.32982140713299, CurrSamplesPerSec=5.697316672085901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0033, 'learning_rate': 6.7911111111111115e-06, 'epoch': 8.26} [2022-12-17 02:28:09,037] [INFO] [timer.py:197:stop] 0/3902, RunningAvgSamplesPerSec=6.329814432386257, CurrSamplesPerSec=5.684121475399644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:20,330] [INFO] [timer.py:197:stop] 0/3904, RunningAvgSamplesPerSec=6.329825137878348, CurrSamplesPerSec=5.714867563515535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:31,665] [INFO] [timer.py:197:stop] 0/3906, RunningAvgSamplesPerSec=6.329816776466052, CurrSamplesPerSec=5.673013657138756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:42,966] [INFO] [timer.py:197:stop] 0/3908, RunningAvgSamplesPerSec=6.329824875290936, CurrSamplesPerSec=5.693127915195306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:54,283] [INFO] [timer.py:197:stop] 0/3910, RunningAvgSamplesPerSec=6.329828643103604, CurrSamplesPerSec=5.70661806279359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:05,642] [INFO] [timer.py:197:stop] 0/3912, RunningAvgSamplesPerSec=6.329813334548276, CurrSamplesPerSec=5.662994082657486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:16,978] [INFO] [timer.py:197:stop] 0/3914, RunningAvgSamplesPerSec=6.329811090289342, CurrSamplesPerSec=5.696075086591611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:28,257] [INFO] [timer.py:197:stop] 0/3916, RunningAvgSamplesPerSec=6.329826773522238, CurrSamplesPerSec=5.717712090899444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:39,602] [INFO] [timer.py:197:stop] 0/3918, RunningAvgSamplesPerSec=6.329820899907599, CurrSamplesPerSec=5.685816185999066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:50,928] [INFO] [logging.py:68:log_dist] [Rank 0] step=1960, skipped=5, lr=[6.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 02:29:50,929] [INFO] [timer.py:197:stop] 0/3920, RunningAvgSamplesPerSec=6.329822966762694, CurrSamplesPerSec=5.708094131136135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:02,254] [INFO] [timer.py:197:stop] 0/3922, RunningAvgSamplesPerSec=6.329823766455425, CurrSamplesPerSec=5.702115460318731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:13,585] [INFO] [timer.py:197:stop] 0/3924, RunningAvgSamplesPerSec=6.329823133167677, CurrSamplesPerSec=5.6947779843335375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:24,876] [INFO] [timer.py:197:stop] 0/3926, RunningAvgSamplesPerSec=6.329829857024279, CurrSamplesPerSec=5.710925114045982, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:36,201] [INFO] [timer.py:197:stop] 0/3928, RunningAvgSamplesPerSec=6.329830280622669, CurrSamplesPerSec=5.70457171231726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:47,521] [INFO] [timer.py:197:stop] 0/3930, RunningAvgSamplesPerSec=6.329826977832305, CurrSamplesPerSec=5.68627989112763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:58,834] [INFO] [timer.py:197:stop] 0/3932, RunningAvgSamplesPerSec=6.329825438915529, CurrSamplesPerSec=5.691355959464071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:10,157] [INFO] [timer.py:197:stop] 0/3934, RunningAvgSamplesPerSec=6.3298256067430785, CurrSamplesPerSec=5.703385855202171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:21,457] [INFO] [timer.py:197:stop] 0/3936, RunningAvgSamplesPerSec=6.329834354702879, CurrSamplesPerSec=5.703952057433165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:32,788] [INFO] [timer.py:197:stop] 0/3938, RunningAvgSamplesPerSec=6.329832611573973, CurrSamplesPerSec=5.681675823878106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:44,075] [INFO] [logging.py:68:log_dist] [Rank 0] step=1970, skipped=5, lr=[6.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 02:31:44,076] [INFO] [timer.py:197:stop] 0/3940, RunningAvgSamplesPerSec=6.329846029232881, CurrSamplesPerSec=5.723946147254611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:55,411] [INFO] [timer.py:197:stop] 0/3942, RunningAvgSamplesPerSec=6.32984408328006, CurrSamplesPerSec=5.6886128123952995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:06,722] [INFO] [timer.py:197:stop] 0/3944, RunningAvgSamplesPerSec=6.329853290493203, CurrSamplesPerSec=5.7044201802018675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:18,031] [INFO] [timer.py:197:stop] 0/3946, RunningAvgSamplesPerSec=6.329859515800882, CurrSamplesPerSec=5.689871890505715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:29,345] [INFO] [timer.py:197:stop] 0/3948, RunningAvgSamplesPerSec=6.329865167645183, CurrSamplesPerSec=5.704709674156191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:40,666] [INFO] [timer.py:197:stop] 0/3950, RunningAvgSamplesPerSec=6.3298671032413685, CurrSamplesPerSec=5.698391134644368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0035, 'learning_rate': 6.735555555555556e-06, 'epoch': 8.37} [2022-12-17 02:32:51,918] [INFO] [timer.py:197:stop] 0/3952, RunningAvgSamplesPerSec=6.3298891298894295, CurrSamplesPerSec=5.735413324032536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:03,196] [INFO] [timer.py:197:stop] 0/3954, RunningAvgSamplesPerSec=6.329902086633342, CurrSamplesPerSec=5.721910526340695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:14,499] [INFO] [timer.py:197:stop] 0/3956, RunningAvgSamplesPerSec=6.329905822912193, CurrSamplesPerSec=5.691245912501485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:25,826] [INFO] [timer.py:197:stop] 0/3958, RunningAvgSamplesPerSec=6.329906249651454, CurrSamplesPerSec=5.686801981118413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:37,181] [INFO] [logging.py:68:log_dist] [Rank 0] step=1980, skipped=5, lr=[6.724444444444444e-06], mom=[[0.9, 0.999]] [2022-12-17 02:33:37,182] [INFO] [timer.py:197:stop] 0/3960, RunningAvgSamplesPerSec=6.329902215016532, CurrSamplesPerSec=5.65284436763619, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:48,619] [INFO] [timer.py:197:stop] 0/3962, RunningAvgSamplesPerSec=6.329875482170933, CurrSamplesPerSec=5.621293394307236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:00,103] [INFO] [timer.py:197:stop] 0/3964, RunningAvgSamplesPerSec=6.329851487880202, CurrSamplesPerSec=5.641696364327095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:11,445] [INFO] [timer.py:197:stop] 0/3966, RunningAvgSamplesPerSec=6.329844461992682, CurrSamplesPerSec=5.665015728261031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:22,785] [INFO] [timer.py:197:stop] 0/3968, RunningAvgSamplesPerSec=6.329850249763681, CurrSamplesPerSec=5.709415765273105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:34,214] [INFO] [timer.py:197:stop] 0/3970, RunningAvgSamplesPerSec=6.329839127590746, CurrSamplesPerSec=5.679090016835304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:45,611] [INFO] [timer.py:197:stop] 0/3972, RunningAvgSamplesPerSec=6.329838474291728, CurrSamplesPerSec=5.683097385387977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:57,039] [INFO] [timer.py:197:stop] 0/3974, RunningAvgSamplesPerSec=6.329831710087475, CurrSamplesPerSec=5.673985662776667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:08,398] [INFO] [timer.py:197:stop] 0/3976, RunningAvgSamplesPerSec=6.329831826261857, CurrSamplesPerSec=5.678151333205712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:19,776] [INFO] [timer.py:197:stop] 0/3978, RunningAvgSamplesPerSec=6.329833221018402, CurrSamplesPerSec=5.700546360450433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:31,153] [INFO] [logging.py:68:log_dist] [Rank 0] step=1990, skipped=5, lr=[6.702222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 02:35:31,155] [INFO] [timer.py:197:stop] 0/3980, RunningAvgSamplesPerSec=6.329824364419316, CurrSamplesPerSec=5.666521307084508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:42,508] [INFO] [timer.py:197:stop] 0/3982, RunningAvgSamplesPerSec=6.329817725488676, CurrSamplesPerSec=5.68016122078727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:53,948] [INFO] [timer.py:197:stop] 0/3984, RunningAvgSamplesPerSec=6.329809411355801, CurrSamplesPerSec=5.673447457141223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:05,340] [INFO] [timer.py:197:stop] 0/3986, RunningAvgSamplesPerSec=6.329800045419625, CurrSamplesPerSec=5.673120602221541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:16,732] [INFO] [timer.py:197:stop] 0/3988, RunningAvgSamplesPerSec=6.32980055398527, CurrSamplesPerSec=5.690515269998576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:28,053] [INFO] [timer.py:197:stop] 0/3990, RunningAvgSamplesPerSec=6.329802810481422, CurrSamplesPerSec=5.70927976144318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:39,397] [INFO] [timer.py:197:stop] 0/3992, RunningAvgSamplesPerSec=6.329802819995766, CurrSamplesPerSec=5.675330423820994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:50,710] [INFO] [timer.py:197:stop] 0/3994, RunningAvgSamplesPerSec=6.329812884980746, CurrSamplesPerSec=5.71857594226552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:37:02,127] [INFO] [timer.py:197:stop] 0/3996, RunningAvgSamplesPerSec=6.3297965344729885, CurrSamplesPerSec=5.635572645721929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:37:13,514] [INFO] [timer.py:197:stop] 0/3998, RunningAvgSamplesPerSec=6.329792063006823, CurrSamplesPerSec=5.678675775718042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:37:24,866] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=5, lr=[6.680000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 02:37:24,868] [INFO] [timer.py:197:stop] 0/4000, RunningAvgSamplesPerSec=6.3297894302029585, CurrSamplesPerSec=5.6972066363266105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 6.680000000000001e-06, 'epoch': 8.47} {'eval_loss': 0.18408203125, 'eval_wer': 9.830389863906742, 'eval_runtime': 2143.7282, 'eval_samples_per_second': 3.598, 'eval_steps_per_second': 0.45, 'epoch': 8.47} [2022-12-17 03:13:12,165] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step2000 is begin to save! [2022-12-17 03:13:12,175] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-2000/global_step2000/mp_rank_00_model_states.pt [2022-12-17 03:13:12,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-2000/global_step2000/mp_rank_00_model_states.pt... [2022-12-17 03:13:15,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-2000/global_step2000/mp_rank_00_model_states.pt. [2022-12-17 03:13:15,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-17 03:13:31,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-17 03:13:31,523] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-17 03:13:31,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! [2022-12-17 03:15:34,464] [INFO] [timer.py:197:stop] 0/4002, RunningAvgSamplesPerSec=6.32971650220364, CurrSamplesPerSec=5.432966773902989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:15:46,027] [INFO] [timer.py:197:stop] 0/4004, RunningAvgSamplesPerSec=6.329736699376116, CurrSamplesPerSec=5.732430946357427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:15:57,711] [INFO] [timer.py:197:stop] 0/4006, RunningAvgSamplesPerSec=6.329714692420776, CurrSamplesPerSec=5.7052121153177975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:09,000] [INFO] [timer.py:197:stop] 0/4008, RunningAvgSamplesPerSec=6.329729049348178, CurrSamplesPerSec=5.721881254416873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:20,309] [INFO] [timer.py:197:stop] 0/4010, RunningAvgSamplesPerSec=6.32973210769902, CurrSamplesPerSec=5.6921966559580985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:31,950] [INFO] [timer.py:197:stop] 0/4012, RunningAvgSamplesPerSec=6.329631531896987, CurrSamplesPerSec=5.6903598999459915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:43,277] [INFO] [timer.py:197:stop] 0/4014, RunningAvgSamplesPerSec=6.3296284082486265, CurrSamplesPerSec=5.682677266268421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:54,617] [INFO] [timer.py:197:stop] 0/4016, RunningAvgSamplesPerSec=6.329620990535829, CurrSamplesPerSec=5.6637007057802835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:06,320] [INFO] [timer.py:197:stop] 0/4018, RunningAvgSamplesPerSec=6.32961940678384, CurrSamplesPerSec=5.690537948945658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:17,689] [INFO] [logging.py:68:log_dist] [Rank 0] step=2010, skipped=5, lr=[6.657777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 03:17:17,691] [INFO] [timer.py:197:stop] 0/4020, RunningAvgSamplesPerSec=6.329603248560779, CurrSamplesPerSec=5.636197649705268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:28,951] [INFO] [timer.py:197:stop] 0/4022, RunningAvgSamplesPerSec=6.329621839411704, CurrSamplesPerSec=5.729020022201075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:40,565] [INFO] [timer.py:197:stop] 0/4024, RunningAvgSamplesPerSec=6.329636464480563, CurrSamplesPerSec=5.72667583837786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:51,974] [INFO] [timer.py:197:stop] 0/4026, RunningAvgSamplesPerSec=6.329651877567379, CurrSamplesPerSec=5.730219255254423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:03,329] [INFO] [timer.py:197:stop] 0/4028, RunningAvgSamplesPerSec=6.3296407044135305, CurrSamplesPerSec=5.6457342331832026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:14,820] [INFO] [timer.py:197:stop] 0/4030, RunningAvgSamplesPerSec=6.329647226145717, CurrSamplesPerSec=5.706841778088452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:26,288] [INFO] [timer.py:197:stop] 0/4032, RunningAvgSamplesPerSec=6.329657786556081, CurrSamplesPerSec=5.715247919575367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:37,684] [INFO] [timer.py:197:stop] 0/4034, RunningAvgSamplesPerSec=6.329633943231989, CurrSamplesPerSec=5.610441031101313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:49,059] [INFO] [timer.py:197:stop] 0/4036, RunningAvgSamplesPerSec=6.329643210614384, CurrSamplesPerSec=5.71373385290509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:00,385] [INFO] [timer.py:197:stop] 0/4038, RunningAvgSamplesPerSec=6.329647008206662, CurrSamplesPerSec=5.697957140380308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:11,655] [INFO] [logging.py:68:log_dist] [Rank 0] step=2020, skipped=5, lr=[6.6355555555555565e-06], mom=[[0.9, 0.999]] [2022-12-17 03:19:11,656] [INFO] [timer.py:197:stop] 0/4040, RunningAvgSamplesPerSec=6.329656548870118, CurrSamplesPerSec=5.712982834924393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:22,964] [INFO] [timer.py:197:stop] 0/4042, RunningAvgSamplesPerSec=6.329664344944999, CurrSamplesPerSec=5.707049494758968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:34,251] [INFO] [timer.py:197:stop] 0/4044, RunningAvgSamplesPerSec=6.329677524006703, CurrSamplesPerSec=5.702061923775371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:45,586] [INFO] [timer.py:197:stop] 0/4046, RunningAvgSamplesPerSec=6.329673899017344, CurrSamplesPerSec=5.676386285852038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:56,869] [INFO] [timer.py:197:stop] 0/4048, RunningAvgSamplesPerSec=6.329685060089817, CurrSamplesPerSec=5.705445179125403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:08,153] [INFO] [timer.py:197:stop] 0/4050, RunningAvgSamplesPerSec=6.329690147179106, CurrSamplesPerSec=5.718421959793726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0044, 'learning_rate': 6.6244444444444445e-06, 'epoch': 8.58} [2022-12-17 03:20:19,482] [INFO] [timer.py:197:stop] 0/4052, RunningAvgSamplesPerSec=6.329687546775836, CurrSamplesPerSec=5.677962528789538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:30,761] [INFO] [timer.py:197:stop] 0/4054, RunningAvgSamplesPerSec=6.329704134400702, CurrSamplesPerSec=5.731804735092358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:42,053] [INFO] [timer.py:197:stop] 0/4056, RunningAvgSamplesPerSec=6.3297167432347985, CurrSamplesPerSec=5.729271665429117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:53,666] [INFO] [timer.py:197:stop] 0/4058, RunningAvgSamplesPerSec=6.329700568900112, CurrSamplesPerSec=5.670981747554733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:04,942] [INFO] [logging.py:68:log_dist] [Rank 0] step=2030, skipped=5, lr=[6.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 03:21:04,944] [INFO] [timer.py:197:stop] 0/4060, RunningAvgSamplesPerSec=6.329714148054933, CurrSamplesPerSec=5.7095685344444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:16,406] [INFO] [timer.py:197:stop] 0/4062, RunningAvgSamplesPerSec=6.32971542335254, CurrSamplesPerSec=5.707395803713119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:27,826] [INFO] [timer.py:197:stop] 0/4064, RunningAvgSamplesPerSec=6.329716546008745, CurrSamplesPerSec=5.7032450492428435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:39,082] [INFO] [timer.py:197:stop] 0/4066, RunningAvgSamplesPerSec=6.329727383544931, CurrSamplesPerSec=5.708138313313967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:50,416] [INFO] [timer.py:197:stop] 0/4068, RunningAvgSamplesPerSec=6.329723384009874, CurrSamplesPerSec=5.685249002830177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:01,855] [INFO] [timer.py:197:stop] 0/4070, RunningAvgSamplesPerSec=6.329735288298022, CurrSamplesPerSec=5.720577250197871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:13,353] [INFO] [timer.py:197:stop] 0/4072, RunningAvgSamplesPerSec=6.329753146864072, CurrSamplesPerSec=5.725375509643616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:24,779] [INFO] [timer.py:197:stop] 0/4074, RunningAvgSamplesPerSec=6.329715954461023, CurrSamplesPerSec=5.568953772876529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:36,381] [INFO] [timer.py:197:stop] 0/4076, RunningAvgSamplesPerSec=6.329719132052436, CurrSamplesPerSec=5.680185980782685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:47,778] [INFO] [timer.py:197:stop] 0/4078, RunningAvgSamplesPerSec=6.3297134755114035, CurrSamplesPerSec=5.69351552665189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:59,261] [INFO] [logging.py:68:log_dist] [Rank 0] step=2040, skipped=5, lr=[6.591111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 03:22:59,263] [INFO] [timer.py:197:stop] 0/4080, RunningAvgSamplesPerSec=6.329662099764259, CurrSamplesPerSec=5.5199089195498106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:10,789] [INFO] [timer.py:197:stop] 0/4082, RunningAvgSamplesPerSec=6.32966716446522, CurrSamplesPerSec=5.701408180754435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:22,027] [INFO] [timer.py:197:stop] 0/4084, RunningAvgSamplesPerSec=6.329677835786168, CurrSamplesPerSec=5.70064635601823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:33,530] [INFO] [timer.py:197:stop] 0/4086, RunningAvgSamplesPerSec=6.329621485972424, CurrSamplesPerSec=5.497445663780493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:44,854] [INFO] [timer.py:197:stop] 0/4088, RunningAvgSamplesPerSec=6.3296199090930605, CurrSamplesPerSec=5.686864869676773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:56,148] [INFO] [timer.py:197:stop] 0/4090, RunningAvgSamplesPerSec=6.329627497694356, CurrSamplesPerSec=5.707074489723329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:07,626] [INFO] [timer.py:197:stop] 0/4092, RunningAvgSamplesPerSec=6.329579498207154, CurrSamplesPerSec=5.5263138434487615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:18,910] [INFO] [timer.py:197:stop] 0/4094, RunningAvgSamplesPerSec=6.3295852834765025, CurrSamplesPerSec=5.718602012903586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:30,243] [INFO] [timer.py:197:stop] 0/4096, RunningAvgSamplesPerSec=6.329582388154523, CurrSamplesPerSec=5.686816197184262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:41,558] [INFO] [timer.py:197:stop] 0/4098, RunningAvgSamplesPerSec=6.329574192901616, CurrSamplesPerSec=5.650972720985214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:52,865] [INFO] [logging.py:68:log_dist] [Rank 0] step=2050, skipped=5, lr=[6.568888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 03:24:52,867] [INFO] [timer.py:197:stop] 0/4100, RunningAvgSamplesPerSec=6.329577261261745, CurrSamplesPerSec=5.709207876322052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0038, 'learning_rate': 6.568888888888889e-06, 'epoch': 8.69} [2022-12-17 03:25:04,148] [INFO] [timer.py:197:stop] 0/4102, RunningAvgSamplesPerSec=6.329584710681415, CurrSamplesPerSec=5.7096530588978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:15,433] [INFO] [timer.py:197:stop] 0/4104, RunningAvgSamplesPerSec=6.329591737707764, CurrSamplesPerSec=5.7040087808033695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:26,720] [INFO] [timer.py:197:stop] 0/4106, RunningAvgSamplesPerSec=6.3296020891744265, CurrSamplesPerSec=5.718893922820299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:38,007] [INFO] [timer.py:197:stop] 0/4108, RunningAvgSamplesPerSec=6.329611851038329, CurrSamplesPerSec=5.717244706560494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:49,318] [INFO] [timer.py:197:stop] 0/4110, RunningAvgSamplesPerSec=6.32961398025573, CurrSamplesPerSec=5.707869589430014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:00,697] [INFO] [timer.py:197:stop] 0/4112, RunningAvgSamplesPerSec=6.329599969613452, CurrSamplesPerSec=5.651735367307539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:11,995] [INFO] [timer.py:197:stop] 0/4114, RunningAvgSamplesPerSec=6.329607728066423, CurrSamplesPerSec=5.702019531298125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:23,266] [INFO] [timer.py:197:stop] 0/4116, RunningAvgSamplesPerSec=6.329624606775776, CurrSamplesPerSec=5.729385878152315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:34,582] [INFO] [timer.py:197:stop] 0/4118, RunningAvgSamplesPerSec=6.3296290795189885, CurrSamplesPerSec=5.688858265982743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:45,872] [INFO] [logging.py:68:log_dist] [Rank 0] step=2060, skipped=5, lr=[6.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 03:26:45,874] [INFO] [timer.py:197:stop] 0/4120, RunningAvgSamplesPerSec=6.3296391052996706, CurrSamplesPerSec=5.710607776037591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:57,215] [INFO] [timer.py:197:stop] 0/4122, RunningAvgSamplesPerSec=6.3296344831495865, CurrSamplesPerSec=5.668019787475643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:08,515] [INFO] [timer.py:197:stop] 0/4124, RunningAvgSamplesPerSec=6.3296411666454695, CurrSamplesPerSec=5.702394787187077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:19,807] [INFO] [timer.py:197:stop] 0/4126, RunningAvgSamplesPerSec=6.3296543447409706, CurrSamplesPerSec=5.716563129758426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:31,092] [INFO] [timer.py:197:stop] 0/4128, RunningAvgSamplesPerSec=6.329665769004512, CurrSamplesPerSec=5.713112449253873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:42,402] [INFO] [timer.py:197:stop] 0/4130, RunningAvgSamplesPerSec=6.329669537874054, CurrSamplesPerSec=5.685855206581097, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:53,659] [INFO] [timer.py:197:stop] 0/4132, RunningAvgSamplesPerSec=6.329684738201763, CurrSamplesPerSec=5.730757275477921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:05,083] [INFO] [timer.py:197:stop] 0/4134, RunningAvgSamplesPerSec=6.329653695177852, CurrSamplesPerSec=5.704550376108452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:16,391] [INFO] [timer.py:197:stop] 0/4136, RunningAvgSamplesPerSec=6.329658226333892, CurrSamplesPerSec=5.669268567918603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:27,660] [INFO] [timer.py:197:stop] 0/4138, RunningAvgSamplesPerSec=6.329669431371385, CurrSamplesPerSec=5.7140525115073775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:39,033] [INFO] [logging.py:68:log_dist] [Rank 0] step=2070, skipped=5, lr=[6.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 03:28:39,035] [INFO] [timer.py:197:stop] 0/4140, RunningAvgSamplesPerSec=6.329653569925222, CurrSamplesPerSec=5.702104074634847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:50,314] [INFO] [timer.py:197:stop] 0/4142, RunningAvgSamplesPerSec=6.329667150732836, CurrSamplesPerSec=5.711580798448425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:01,628] [INFO] [timer.py:197:stop] 0/4144, RunningAvgSamplesPerSec=6.329672264319184, CurrSamplesPerSec=5.7046015347856525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:13,190] [INFO] [timer.py:197:stop] 0/4146, RunningAvgSamplesPerSec=6.329590255909673, CurrSamplesPerSec=5.693526153503661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:24,479] [INFO] [timer.py:197:stop] 0/4148, RunningAvgSamplesPerSec=6.329600253682247, CurrSamplesPerSec=5.712139877360979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:35,740] [INFO] [timer.py:197:stop] 0/4150, RunningAvgSamplesPerSec=6.329614214241194, CurrSamplesPerSec=5.721944677297023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0043, 'learning_rate': 6.513333333333333e-06, 'epoch': 8.79} [2022-12-17 03:29:47,451] [INFO] [timer.py:197:stop] 0/4152, RunningAvgSamplesPerSec=6.329625743832189, CurrSamplesPerSec=5.713240853488009, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:58,927] [INFO] [timer.py:197:stop] 0/4154, RunningAvgSamplesPerSec=6.32962401605577, CurrSamplesPerSec=5.681033239220668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:10,298] [INFO] [timer.py:197:stop] 0/4156, RunningAvgSamplesPerSec=6.329609371145642, CurrSamplesPerSec=5.640112222567924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:21,742] [INFO] [timer.py:197:stop] 0/4158, RunningAvgSamplesPerSec=6.329619336010558, CurrSamplesPerSec=5.725745541206016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:33,113] [INFO] [logging.py:68:log_dist] [Rank 0] step=2080, skipped=5, lr=[6.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 03:30:33,114] [INFO] [timer.py:197:stop] 0/4160, RunningAvgSamplesPerSec=6.3296304931523135, CurrSamplesPerSec=5.719744481367389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:44,377] [INFO] [timer.py:197:stop] 0/4162, RunningAvgSamplesPerSec=6.329639665828144, CurrSamplesPerSec=5.7231092248855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:55,952] [INFO] [timer.py:197:stop] 0/4164, RunningAvgSamplesPerSec=6.329642890850235, CurrSamplesPerSec=5.7104019868553015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:07,478] [INFO] [timer.py:197:stop] 0/4166, RunningAvgSamplesPerSec=6.329641465120764, CurrSamplesPerSec=5.693680488845232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:18,787] [INFO] [timer.py:197:stop] 0/4168, RunningAvgSamplesPerSec=6.329646013398303, CurrSamplesPerSec=5.698077365169859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:30,204] [INFO] [timer.py:197:stop] 0/4170, RunningAvgSamplesPerSec=6.32965932265026, CurrSamplesPerSec=5.731016412667152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:41,572] [INFO] [timer.py:197:stop] 0/4172, RunningAvgSamplesPerSec=6.329659992211202, CurrSamplesPerSec=5.703579747419184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:53,022] [INFO] [timer.py:197:stop] 0/4174, RunningAvgSamplesPerSec=6.329620320914269, CurrSamplesPerSec=5.552174278511241, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:04,605] [INFO] [timer.py:197:stop] 0/4176, RunningAvgSamplesPerSec=6.329618251779879, CurrSamplesPerSec=5.664145751377774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:16,047] [INFO] [timer.py:197:stop] 0/4178, RunningAvgSamplesPerSec=6.329620119317804, CurrSamplesPerSec=5.691267149303224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:27,557] [INFO] [logging.py:68:log_dist] [Rank 0] step=2090, skipped=5, lr=[6.480000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 03:32:27,559] [INFO] [timer.py:197:stop] 0/4180, RunningAvgSamplesPerSec=6.329559581631733, CurrSamplesPerSec=5.49025277515664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:38,847] [INFO] [timer.py:197:stop] 0/4182, RunningAvgSamplesPerSec=6.329566571122446, CurrSamplesPerSec=5.7119519660201625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:50,221] [INFO] [timer.py:197:stop] 0/4184, RunningAvgSamplesPerSec=6.329551512684384, CurrSamplesPerSec=5.658288559220713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:01,779] [INFO] [timer.py:197:stop] 0/4186, RunningAvgSamplesPerSec=6.329482129285263, CurrSamplesPerSec=5.455197789627966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:13,234] [INFO] [timer.py:197:stop] 0/4188, RunningAvgSamplesPerSec=6.329489056102356, CurrSamplesPerSec=5.689115074579025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:24,534] [INFO] [timer.py:197:stop] 0/4190, RunningAvgSamplesPerSec=6.3294930470222965, CurrSamplesPerSec=5.715814775678154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:36,200] [INFO] [timer.py:197:stop] 0/4192, RunningAvgSamplesPerSec=6.329385693595587, CurrSamplesPerSec=5.342715526710825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:47,585] [INFO] [timer.py:197:stop] 0/4194, RunningAvgSamplesPerSec=6.329387920504537, CurrSamplesPerSec=5.708342239921322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:58,934] [INFO] [timer.py:197:stop] 0/4196, RunningAvgSamplesPerSec=6.329393759067435, CurrSamplesPerSec=5.698761557733716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:10,299] [INFO] [timer.py:197:stop] 0/4198, RunningAvgSamplesPerSec=6.329377635096463, CurrSamplesPerSec=5.627035752573431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:21,607] [INFO] [logging.py:68:log_dist] [Rank 0] step=2100, skipped=5, lr=[6.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 03:34:21,609] [INFO] [timer.py:197:stop] 0/4200, RunningAvgSamplesPerSec=6.329387313311945, CurrSamplesPerSec=5.706954612955244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0033, 'learning_rate': 6.457777777777778e-06, 'epoch': 8.9} [2022-12-17 03:34:32,898] [INFO] [timer.py:197:stop] 0/4202, RunningAvgSamplesPerSec=6.3293938964875185, CurrSamplesPerSec=5.699129125038513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:44,297] [INFO] [timer.py:197:stop] 0/4204, RunningAvgSamplesPerSec=6.329363594365425, CurrSamplesPerSec=5.581797360025495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:55,847] [INFO] [timer.py:197:stop] 0/4206, RunningAvgSamplesPerSec=6.329371810931073, CurrSamplesPerSec=5.705324158184767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:07,174] [INFO] [timer.py:197:stop] 0/4208, RunningAvgSamplesPerSec=6.3293724138440295, CurrSamplesPerSec=5.683446328781715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:18,805] [INFO] [timer.py:197:stop] 0/4210, RunningAvgSamplesPerSec=6.329274039267405, CurrSamplesPerSec=5.35519432319045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:30,104] [INFO] [timer.py:197:stop] 0/4212, RunningAvgSamplesPerSec=6.32927545611261, CurrSamplesPerSec=5.688271912984834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:41,411] [INFO] [timer.py:197:stop] 0/4214, RunningAvgSamplesPerSec=6.329279504683213, CurrSamplesPerSec=5.702676806686563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:52,661] [INFO] [timer.py:197:stop] 0/4216, RunningAvgSamplesPerSec=6.3292975459712455, CurrSamplesPerSec=5.735788331041529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:03,980] [INFO] [timer.py:197:stop] 0/4218, RunningAvgSamplesPerSec=6.3292954506333325, CurrSamplesPerSec=5.720244698668586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:15,270] [INFO] [logging.py:68:log_dist] [Rank 0] step=2110, skipped=5, lr=[6.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 03:36:15,271] [INFO] [timer.py:197:stop] 0/4220, RunningAvgSamplesPerSec=6.3293015183733585, CurrSamplesPerSec=5.70264627745624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:26,550] [INFO] [timer.py:197:stop] 0/4222, RunningAvgSamplesPerSec=6.329314799213386, CurrSamplesPerSec=5.726156173463574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:37,964] [INFO] [timer.py:197:stop] 0/4224, RunningAvgSamplesPerSec=6.329297925836548, CurrSamplesPerSec=5.658801704212691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:49,489] [INFO] [timer.py:197:stop] 0/4226, RunningAvgSamplesPerSec=6.329300030042395, CurrSamplesPerSec=5.680489849179637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:00,832] [INFO] [timer.py:197:stop] 0/4228, RunningAvgSamplesPerSec=6.329290592139952, CurrSamplesPerSec=5.6715307493274505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:12,211] [INFO] [timer.py:197:stop] 0/4230, RunningAvgSamplesPerSec=6.329296799735275, CurrSamplesPerSec=5.710345622158048, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:23,534] [INFO] [timer.py:197:stop] 0/4232, RunningAvgSamplesPerSec=6.329293914072135, CurrSamplesPerSec=5.700618996107484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:35,089] [INFO] [timer.py:197:stop] 0/4234, RunningAvgSamplesPerSec=6.329223078174755, CurrSamplesPerSec=5.474385340765658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:46,417] [INFO] [timer.py:197:stop] 0/4236, RunningAvgSamplesPerSec=6.329221569757894, CurrSamplesPerSec=5.6885783348551735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:57,764] [INFO] [timer.py:197:stop] 0/4238, RunningAvgSamplesPerSec=6.3292032239593174, CurrSamplesPerSec=5.621259492527623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:09,097] [INFO] [logging.py:68:log_dist] [Rank 0] step=2120, skipped=5, lr=[6.4133333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 03:38:09,099] [INFO] [timer.py:197:stop] 0/4240, RunningAvgSamplesPerSec=6.3291967428627265, CurrSamplesPerSec=5.660347665602659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:20,450] [INFO] [timer.py:197:stop] 0/4242, RunningAvgSamplesPerSec=6.329187590411158, CurrSamplesPerSec=5.664508866697526, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:31,735] [INFO] [timer.py:197:stop] 0/4244, RunningAvgSamplesPerSec=6.329196166706487, CurrSamplesPerSec=5.720654542271054, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:43,043] [INFO] [timer.py:197:stop] 0/4246, RunningAvgSamplesPerSec=6.3291981518458655, CurrSamplesPerSec=5.692124476172127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:51,551] [INFO] [timer.py:197:stop] 0/4248, RunningAvgSamplesPerSec=6.329926955740864, CurrSamplesPerSec=10.191885733525591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:02,808] [INFO] [timer.py:197:stop] 0/4250, RunningAvgSamplesPerSec=6.329945497555158, CurrSamplesPerSec=5.748404824921448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 6.402222222222223e-06, 'epoch': 9.0} [2022-12-17 03:39:14,100] [INFO] [timer.py:197:stop] 0/4252, RunningAvgSamplesPerSec=6.329951721289756, CurrSamplesPerSec=5.724151693184711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:25,414] [INFO] [timer.py:197:stop] 0/4254, RunningAvgSamplesPerSec=6.329950908156978, CurrSamplesPerSec=5.700597931576377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:36,694] [INFO] [timer.py:197:stop] 0/4256, RunningAvgSamplesPerSec=6.329955788741976, CurrSamplesPerSec=5.695662231242793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:47,972] [INFO] [timer.py:197:stop] 0/4258, RunningAvgSamplesPerSec=6.329966695173077, CurrSamplesPerSec=5.712562662301858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:59,270] [INFO] [logging.py:68:log_dist] [Rank 0] step=2130, skipped=5, lr=[6.391111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 03:39:59,272] [INFO] [timer.py:197:stop] 0/4260, RunningAvgSamplesPerSec=6.3299661333167165, CurrSamplesPerSec=5.71034149203002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:10,580] [INFO] [timer.py:197:stop] 0/4262, RunningAvgSamplesPerSec=6.329958580709758, CurrSamplesPerSec=5.673393738119202, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:21,866] [INFO] [timer.py:197:stop] 0/4264, RunningAvgSamplesPerSec=6.3299615564115275, CurrSamplesPerSec=5.704911658444872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:33,141] [INFO] [timer.py:197:stop] 0/4266, RunningAvgSamplesPerSec=6.329968199591096, CurrSamplesPerSec=5.7203561137522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:44,456] [INFO] [timer.py:197:stop] 0/4268, RunningAvgSamplesPerSec=6.329969998219967, CurrSamplesPerSec=5.713600561987379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:55,751] [INFO] [timer.py:197:stop] 0/4270, RunningAvgSamplesPerSec=6.329975276219748, CurrSamplesPerSec=5.6975735193423205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:07,069] [INFO] [timer.py:197:stop] 0/4272, RunningAvgSamplesPerSec=6.329973921749979, CurrSamplesPerSec=5.702552995722206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:18,377] [INFO] [timer.py:197:stop] 0/4274, RunningAvgSamplesPerSec=6.329976146136992, CurrSamplesPerSec=5.694227613441014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:29,677] [INFO] [timer.py:197:stop] 0/4276, RunningAvgSamplesPerSec=6.329976503267094, CurrSamplesPerSec=5.705686993857206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:40,961] [INFO] [timer.py:197:stop] 0/4278, RunningAvgSamplesPerSec=6.329981382624964, CurrSamplesPerSec=5.711891438350277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:52,457] [INFO] [logging.py:68:log_dist] [Rank 0] step=2140, skipped=5, lr=[6.368888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 03:41:52,459] [INFO] [timer.py:197:stop] 0/4280, RunningAvgSamplesPerSec=6.329982693290168, CurrSamplesPerSec=5.707232958038073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:03,725] [INFO] [timer.py:197:stop] 0/4282, RunningAvgSamplesPerSec=6.329991865303273, CurrSamplesPerSec=5.713054571959741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:15,042] [INFO] [timer.py:197:stop] 0/4284, RunningAvgSamplesPerSec=6.329990865925492, CurrSamplesPerSec=5.676679663925739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:26,344] [INFO] [timer.py:197:stop] 0/4286, RunningAvgSamplesPerSec=6.3299970498352485, CurrSamplesPerSec=5.709284132894068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:37,618] [INFO] [timer.py:197:stop] 0/4288, RunningAvgSamplesPerSec=6.330009159304688, CurrSamplesPerSec=5.725098322614197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:48,894] [INFO] [timer.py:197:stop] 0/4290, RunningAvgSamplesPerSec=6.330021472725454, CurrSamplesPerSec=5.7156183469429696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:00,189] [INFO] [timer.py:197:stop] 0/4292, RunningAvgSamplesPerSec=6.330023543445085, CurrSamplesPerSec=5.6900711368439065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:11,467] [INFO] [timer.py:197:stop] 0/4294, RunningAvgSamplesPerSec=6.33003465710397, CurrSamplesPerSec=5.722863979052479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:22,753] [INFO] [timer.py:197:stop] 0/4296, RunningAvgSamplesPerSec=6.330044033654043, CurrSamplesPerSec=5.711008949653978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:34,052] [INFO] [timer.py:197:stop] 0/4298, RunningAvgSamplesPerSec=6.330051736381823, CurrSamplesPerSec=5.7098993599376096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:45,358] [INFO] [logging.py:68:log_dist] [Rank 0] step=2150, skipped=5, lr=[6.346666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 03:43:45,360] [INFO] [timer.py:197:stop] 0/4300, RunningAvgSamplesPerSec=6.330053968045217, CurrSamplesPerSec=5.711852059591176, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0027, 'learning_rate': 6.346666666666668e-06, 'epoch': 9.11} [2022-12-17 03:43:56,648] [INFO] [timer.py:197:stop] 0/4302, RunningAvgSamplesPerSec=6.330057923831548, CurrSamplesPerSec=5.699157196643189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:07,884] [INFO] [timer.py:197:stop] 0/4304, RunningAvgSamplesPerSec=6.3300738850474225, CurrSamplesPerSec=5.737495354931139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:19,393] [INFO] [timer.py:197:stop] 0/4306, RunningAvgSamplesPerSec=6.330077660065506, CurrSamplesPerSec=5.6889624334353215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:30,687] [INFO] [timer.py:197:stop] 0/4308, RunningAvgSamplesPerSec=6.330085346146435, CurrSamplesPerSec=5.708143896824679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:41,955] [INFO] [timer.py:197:stop] 0/4310, RunningAvgSamplesPerSec=6.330095058863617, CurrSamplesPerSec=5.716104697029398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:53,244] [INFO] [timer.py:197:stop] 0/4312, RunningAvgSamplesPerSec=6.330103442896721, CurrSamplesPerSec=5.709839361572828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:04,538] [INFO] [timer.py:197:stop] 0/4314, RunningAvgSamplesPerSec=6.3301104318862835, CurrSamplesPerSec=5.700382210684036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:15,848] [INFO] [timer.py:197:stop] 0/4316, RunningAvgSamplesPerSec=6.33011144671632, CurrSamplesPerSec=5.70177754288311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:27,176] [INFO] [timer.py:197:stop] 0/4318, RunningAvgSamplesPerSec=6.330108280879267, CurrSamplesPerSec=5.688100273302122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:38,475] [INFO] [logging.py:68:log_dist] [Rank 0] step=2160, skipped=5, lr=[6.324444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 03:45:38,477] [INFO] [timer.py:197:stop] 0/4320, RunningAvgSamplesPerSec=6.330114475329232, CurrSamplesPerSec=5.704752106643522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:49,797] [INFO] [timer.py:197:stop] 0/4322, RunningAvgSamplesPerSec=6.330116375057936, CurrSamplesPerSec=5.718255560844172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:01,100] [INFO] [timer.py:197:stop] 0/4324, RunningAvgSamplesPerSec=6.330120682549647, CurrSamplesPerSec=5.712504066748646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:12,375] [INFO] [timer.py:197:stop] 0/4326, RunningAvgSamplesPerSec=6.330133929062189, CurrSamplesPerSec=5.72703455258396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:23,714] [INFO] [timer.py:197:stop] 0/4328, RunningAvgSamplesPerSec=6.330129803058916, CurrSamplesPerSec=5.6888122116816495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:34,999] [INFO] [timer.py:197:stop] 0/4330, RunningAvgSamplesPerSec=6.330134832486639, CurrSamplesPerSec=5.709437380764239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:46,276] [INFO] [timer.py:197:stop] 0/4332, RunningAvgSamplesPerSec=6.330142065442036, CurrSamplesPerSec=5.700766452184192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:57,581] [INFO] [timer.py:197:stop] 0/4334, RunningAvgSamplesPerSec=6.330142658914399, CurrSamplesPerSec=5.686657414993794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:08,853] [INFO] [timer.py:197:stop] 0/4336, RunningAvgSamplesPerSec=6.330150998651473, CurrSamplesPerSec=5.711459517079455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:20,162] [INFO] [timer.py:197:stop] 0/4338, RunningAvgSamplesPerSec=6.330149550688539, CurrSamplesPerSec=5.666942151202926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:31,422] [INFO] [logging.py:68:log_dist] [Rank 0] step=2170, skipped=5, lr=[6.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 03:47:31,424] [INFO] [timer.py:197:stop] 0/4340, RunningAvgSamplesPerSec=6.330167309081633, CurrSamplesPerSec=5.715221149418475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:42,711] [INFO] [timer.py:197:stop] 0/4342, RunningAvgSamplesPerSec=6.33017199685608, CurrSamplesPerSec=5.686261100566463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:54,166] [INFO] [timer.py:197:stop] 0/4344, RunningAvgSamplesPerSec=6.330183846704809, CurrSamplesPerSec=5.696465033629745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:05,427] [INFO] [timer.py:197:stop] 0/4346, RunningAvgSamplesPerSec=6.330196429462367, CurrSamplesPerSec=5.7171146611417045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:16,706] [INFO] [timer.py:197:stop] 0/4348, RunningAvgSamplesPerSec=6.330206849105902, CurrSamplesPerSec=5.714444680646246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:28,017] [INFO] [timer.py:197:stop] 0/4350, RunningAvgSamplesPerSec=6.330208170318338, CurrSamplesPerSec=5.692302394200352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0032, 'learning_rate': 6.291111111111111e-06, 'epoch': 9.22} [2022-12-17 03:48:39,257] [INFO] [timer.py:197:stop] 0/4352, RunningAvgSamplesPerSec=6.330230027558183, CurrSamplesPerSec=5.74510397307503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:50,553] [INFO] [timer.py:197:stop] 0/4354, RunningAvgSamplesPerSec=6.330236067545069, CurrSamplesPerSec=5.6960760535362125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:01,820] [INFO] [timer.py:197:stop] 0/4356, RunningAvgSamplesPerSec=6.330245950766376, CurrSamplesPerSec=5.709252318598857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:13,074] [INFO] [timer.py:197:stop] 0/4358, RunningAvgSamplesPerSec=6.3302603683327785, CurrSamplesPerSec=5.723507031478542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:24,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=2180, skipped=5, lr=[6.280000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 03:49:24,357] [INFO] [timer.py:197:stop] 0/4360, RunningAvgSamplesPerSec=6.330268998724679, CurrSamplesPerSec=5.716786164512912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:35,681] [INFO] [timer.py:197:stop] 0/4362, RunningAvgSamplesPerSec=6.330269874076075, CurrSamplesPerSec=5.709341448091649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:46,987] [INFO] [timer.py:197:stop] 0/4364, RunningAvgSamplesPerSec=6.330273298176687, CurrSamplesPerSec=5.7026029071275754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:58,302] [INFO] [timer.py:197:stop] 0/4366, RunningAvgSamplesPerSec=6.330274186013026, CurrSamplesPerSec=5.711801013859715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:09,593] [INFO] [timer.py:197:stop] 0/4368, RunningAvgSamplesPerSec=6.330281236381866, CurrSamplesPerSec=5.714302111692187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:20,878] [INFO] [timer.py:197:stop] 0/4370, RunningAvgSamplesPerSec=6.330293064442466, CurrSamplesPerSec=5.723522652006832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:32,180] [INFO] [timer.py:197:stop] 0/4372, RunningAvgSamplesPerSec=6.330291145422422, CurrSamplesPerSec=5.684595016149614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:43,488] [INFO] [timer.py:197:stop] 0/4374, RunningAvgSamplesPerSec=6.330295297779702, CurrSamplesPerSec=5.705242186997218, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:54,780] [INFO] [timer.py:197:stop] 0/4376, RunningAvgSamplesPerSec=6.330302984893042, CurrSamplesPerSec=5.711761879416802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:06,066] [INFO] [timer.py:197:stop] 0/4378, RunningAvgSamplesPerSec=6.3303119676765185, CurrSamplesPerSec=5.730619763654686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:17,354] [INFO] [logging.py:68:log_dist] [Rank 0] step=2190, skipped=5, lr=[6.2577777777777785e-06], mom=[[0.9, 0.999]] [2022-12-17 03:51:17,355] [INFO] [timer.py:197:stop] 0/4380, RunningAvgSamplesPerSec=6.330315521160135, CurrSamplesPerSec=5.717432966308704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:28,622] [INFO] [timer.py:197:stop] 0/4382, RunningAvgSamplesPerSec=6.330325204840329, CurrSamplesPerSec=5.717960792921026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:39,891] [INFO] [timer.py:197:stop] 0/4384, RunningAvgSamplesPerSec=6.330339635750931, CurrSamplesPerSec=5.732306574514285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:51,192] [INFO] [timer.py:197:stop] 0/4386, RunningAvgSamplesPerSec=6.3303441565671585, CurrSamplesPerSec=5.7166807321598165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:02,549] [INFO] [timer.py:197:stop] 0/4388, RunningAvgSamplesPerSec=6.330334339923144, CurrSamplesPerSec=5.64699055579327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:13,829] [INFO] [timer.py:197:stop] 0/4390, RunningAvgSamplesPerSec=6.330346790628633, CurrSamplesPerSec=5.7156687307214815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:25,123] [INFO] [timer.py:197:stop] 0/4392, RunningAvgSamplesPerSec=6.33035745595913, CurrSamplesPerSec=5.719217787639403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:36,425] [INFO] [timer.py:197:stop] 0/4394, RunningAvgSamplesPerSec=6.330364386815062, CurrSamplesPerSec=5.710929245018288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:47,718] [INFO] [timer.py:197:stop] 0/4396, RunningAvgSamplesPerSec=6.330374101398396, CurrSamplesPerSec=5.722247418954987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:58,994] [INFO] [timer.py:197:stop] 0/4398, RunningAvgSamplesPerSec=6.330389488345945, CurrSamplesPerSec=5.73566454879658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:10,356] [INFO] [logging.py:68:log_dist] [Rank 0] step=2200, skipped=5, lr=[6.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 03:53:10,358] [INFO] [timer.py:197:stop] 0/4400, RunningAvgSamplesPerSec=6.330381002027457, CurrSamplesPerSec=5.683259337924224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0028, 'learning_rate': 6.235555555555556e-06, 'epoch': 9.32} [2022-12-17 03:53:21,684] [INFO] [timer.py:197:stop] 0/4402, RunningAvgSamplesPerSec=6.330380721761244, CurrSamplesPerSec=5.681286215532228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:32,959] [INFO] [timer.py:197:stop] 0/4404, RunningAvgSamplesPerSec=6.33039288991234, CurrSamplesPerSec=5.704690276657935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:44,300] [INFO] [timer.py:197:stop] 0/4406, RunningAvgSamplesPerSec=6.3303819555293055, CurrSamplesPerSec=5.672073625116918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:55,668] [INFO] [timer.py:197:stop] 0/4408, RunningAvgSamplesPerSec=6.3303853532268946, CurrSamplesPerSec=5.713735312329119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:06,948] [INFO] [timer.py:197:stop] 0/4410, RunningAvgSamplesPerSec=6.3303942961100095, CurrSamplesPerSec=5.709733942398946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:18,343] [INFO] [timer.py:197:stop] 0/4412, RunningAvgSamplesPerSec=6.330372825528667, CurrSamplesPerSec=5.6119057705684305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:29,681] [INFO] [timer.py:197:stop] 0/4414, RunningAvgSamplesPerSec=6.3303670020010205, CurrSamplesPerSec=5.667097441483019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:41,064] [INFO] [timer.py:197:stop] 0/4416, RunningAvgSamplesPerSec=6.330358021308729, CurrSamplesPerSec=5.664311884604425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:52,354] [INFO] [timer.py:197:stop] 0/4418, RunningAvgSamplesPerSec=6.330364103786946, CurrSamplesPerSec=5.71063547488767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:03,657] [INFO] [logging.py:68:log_dist] [Rank 0] step=2210, skipped=5, lr=[6.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 03:55:03,659] [INFO] [timer.py:197:stop] 0/4420, RunningAvgSamplesPerSec=6.330366350568987, CurrSamplesPerSec=5.696660631635473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:15,004] [INFO] [timer.py:197:stop] 0/4422, RunningAvgSamplesPerSec=6.330356961663918, CurrSamplesPerSec=5.6580722121687375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:26,283] [INFO] [timer.py:197:stop] 0/4424, RunningAvgSamplesPerSec=6.330366423816739, CurrSamplesPerSec=5.7198256510353795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:37,536] [INFO] [timer.py:197:stop] 0/4426, RunningAvgSamplesPerSec=6.330382510418656, CurrSamplesPerSec=5.738779356959358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:48,852] [INFO] [timer.py:197:stop] 0/4428, RunningAvgSamplesPerSec=6.330381786900965, CurrSamplesPerSec=5.713057246934883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:00,144] [INFO] [timer.py:197:stop] 0/4430, RunningAvgSamplesPerSec=6.330387543094171, CurrSamplesPerSec=5.70596473014713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:11,431] [INFO] [timer.py:197:stop] 0/4432, RunningAvgSamplesPerSec=6.3303949265506265, CurrSamplesPerSec=5.709284618611247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:22,718] [INFO] [timer.py:197:stop] 0/4434, RunningAvgSamplesPerSec=6.330398974040821, CurrSamplesPerSec=5.728504821701217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:34,005] [INFO] [timer.py:197:stop] 0/4436, RunningAvgSamplesPerSec=6.3304073938842675, CurrSamplesPerSec=5.695226960885154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:45,500] [INFO] [timer.py:197:stop] 0/4438, RunningAvgSamplesPerSec=6.330411289428148, CurrSamplesPerSec=5.701911251660377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:56,822] [INFO] [logging.py:68:log_dist] [Rank 0] step=2220, skipped=5, lr=[6.191111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 03:56:56,823] [INFO] [timer.py:197:stop] 0/4440, RunningAvgSamplesPerSec=6.33041506366355, CurrSamplesPerSec=5.713683989698677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:08,107] [INFO] [timer.py:197:stop] 0/4442, RunningAvgSamplesPerSec=6.3304343171901, CurrSamplesPerSec=5.7385732499032756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:19,373] [INFO] [timer.py:197:stop] 0/4444, RunningAvgSamplesPerSec=6.330443822660566, CurrSamplesPerSec=5.722055426774207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:30,636] [INFO] [timer.py:197:stop] 0/4446, RunningAvgSamplesPerSec=6.3304531891264775, CurrSamplesPerSec=5.723357908539943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:41,949] [INFO] [timer.py:197:stop] 0/4448, RunningAvgSamplesPerSec=6.3304547739768315, CurrSamplesPerSec=5.717142910205345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:53,236] [INFO] [timer.py:197:stop] 0/4450, RunningAvgSamplesPerSec=6.33046078844013, CurrSamplesPerSec=5.701764705232637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0027, 'learning_rate': 6.18e-06, 'epoch': 9.43} [2022-12-17 03:58:04,529] [INFO] [timer.py:197:stop] 0/4452, RunningAvgSamplesPerSec=6.330467659165346, CurrSamplesPerSec=5.713794419628813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:15,810] [INFO] [timer.py:197:stop] 0/4454, RunningAvgSamplesPerSec=6.330472250504414, CurrSamplesPerSec=5.717310218572185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:27,136] [INFO] [timer.py:197:stop] 0/4456, RunningAvgSamplesPerSec=6.330470611831681, CurrSamplesPerSec=5.699990516842288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:38,446] [INFO] [timer.py:197:stop] 0/4458, RunningAvgSamplesPerSec=6.330472097937645, CurrSamplesPerSec=5.6944752430173935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:49,765] [INFO] [logging.py:68:log_dist] [Rank 0] step=2230, skipped=5, lr=[6.16888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 03:58:49,766] [INFO] [timer.py:197:stop] 0/4460, RunningAvgSamplesPerSec=6.330472808261691, CurrSamplesPerSec=5.685759101297652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:01,199] [INFO] [timer.py:197:stop] 0/4462, RunningAvgSamplesPerSec=6.330476629555121, CurrSamplesPerSec=5.714341767530068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:12,484] [INFO] [timer.py:197:stop] 0/4464, RunningAvgSamplesPerSec=6.330480789170642, CurrSamplesPerSec=5.678016814778323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:23,796] [INFO] [timer.py:197:stop] 0/4466, RunningAvgSamplesPerSec=6.330481948473897, CurrSamplesPerSec=5.708953135855733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:35,079] [INFO] [timer.py:197:stop] 0/4468, RunningAvgSamplesPerSec=6.33048619652435, CurrSamplesPerSec=5.698196869284114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:46,403] [INFO] [timer.py:197:stop] 0/4470, RunningAvgSamplesPerSec=6.330483432308348, CurrSamplesPerSec=5.68910446418724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:57,679] [INFO] [timer.py:197:stop] 0/4472, RunningAvgSamplesPerSec=6.330493710341828, CurrSamplesPerSec=5.72305016874372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:08,961] [INFO] [timer.py:197:stop] 0/4474, RunningAvgSamplesPerSec=6.330494027807299, CurrSamplesPerSec=5.6877512402808765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:20,267] [INFO] [timer.py:197:stop] 0/4476, RunningAvgSamplesPerSec=6.3304963388366415, CurrSamplesPerSec=5.690306825164165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:31,592] [INFO] [timer.py:197:stop] 0/4478, RunningAvgSamplesPerSec=6.330488847791333, CurrSamplesPerSec=5.673577202367607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:42,914] [INFO] [logging.py:68:log_dist] [Rank 0] step=2240, skipped=5, lr=[6.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 04:00:42,915] [INFO] [timer.py:197:stop] 0/4480, RunningAvgSamplesPerSec=6.330486086094699, CurrSamplesPerSec=5.688008913144727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:54,182] [INFO] [timer.py:197:stop] 0/4482, RunningAvgSamplesPerSec=6.330499265528688, CurrSamplesPerSec=5.71943688629642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:05,462] [INFO] [timer.py:197:stop] 0/4484, RunningAvgSamplesPerSec=6.330509348678946, CurrSamplesPerSec=5.71629215134095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:16,762] [INFO] [timer.py:197:stop] 0/4486, RunningAvgSamplesPerSec=6.330513088250893, CurrSamplesPerSec=5.699628889360823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:28,058] [INFO] [timer.py:197:stop] 0/4488, RunningAvgSamplesPerSec=6.330518727226009, CurrSamplesPerSec=5.707317898735957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:39,342] [INFO] [timer.py:197:stop] 0/4490, RunningAvgSamplesPerSec=6.330526565520092, CurrSamplesPerSec=5.702028978709898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:50,602] [INFO] [timer.py:197:stop] 0/4492, RunningAvgSamplesPerSec=6.33054246376338, CurrSamplesPerSec=5.729671796777712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:01,875] [INFO] [timer.py:197:stop] 0/4494, RunningAvgSamplesPerSec=6.330555609378127, CurrSamplesPerSec=5.725686430644263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:13,375] [INFO] [timer.py:197:stop] 0/4496, RunningAvgSamplesPerSec=6.330563896608057, CurrSamplesPerSec=5.706466179325695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:24,676] [INFO] [timer.py:197:stop] 0/4498, RunningAvgSamplesPerSec=6.330571840505921, CurrSamplesPerSec=5.7039484213581675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:35,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=2250, skipped=5, lr=[6.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:02:35,976] [INFO] [timer.py:197:stop] 0/4500, RunningAvgSamplesPerSec=6.33057808605072, CurrSamplesPerSec=5.700222912132732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0029, 'learning_rate': 6.124444444444445e-06, 'epoch': 9.53} [2022-12-17 04:02:47,309] [INFO] [timer.py:197:stop] 0/4502, RunningAvgSamplesPerSec=6.330590813929422, CurrSamplesPerSec=5.704254595111555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:58,592] [INFO] [timer.py:197:stop] 0/4504, RunningAvgSamplesPerSec=6.330599689229134, CurrSamplesPerSec=5.697111839696128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:09,876] [INFO] [timer.py:197:stop] 0/4506, RunningAvgSamplesPerSec=6.330611905513456, CurrSamplesPerSec=5.717914753402765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:21,200] [INFO] [timer.py:197:stop] 0/4508, RunningAvgSamplesPerSec=6.330605019172982, CurrSamplesPerSec=5.673138107069762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:32,518] [INFO] [timer.py:197:stop] 0/4510, RunningAvgSamplesPerSec=6.330605752558324, CurrSamplesPerSec=5.695185878279826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:43,803] [INFO] [timer.py:197:stop] 0/4512, RunningAvgSamplesPerSec=6.330615522766701, CurrSamplesPerSec=5.706228665318071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:55,121] [INFO] [timer.py:197:stop] 0/4514, RunningAvgSamplesPerSec=6.3306133846847645, CurrSamplesPerSec=5.685259358026691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:06,451] [INFO] [timer.py:197:stop] 0/4516, RunningAvgSamplesPerSec=6.330608909026556, CurrSamplesPerSec=5.700777106120688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:17,795] [INFO] [timer.py:197:stop] 0/4518, RunningAvgSamplesPerSec=6.33060024301429, CurrSamplesPerSec=5.657230118958954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:29,104] [INFO] [logging.py:68:log_dist] [Rank 0] step=2260, skipped=5, lr=[6.102222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 04:04:29,106] [INFO] [timer.py:197:stop] 0/4520, RunningAvgSamplesPerSec=6.330601376732257, CurrSamplesPerSec=5.713666476974156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:40,371] [INFO] [timer.py:197:stop] 0/4522, RunningAvgSamplesPerSec=6.330606514763284, CurrSamplesPerSec=5.709888914787506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:51,644] [INFO] [timer.py:197:stop] 0/4524, RunningAvgSamplesPerSec=6.330609315895327, CurrSamplesPerSec=5.686145950865563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:02,950] [INFO] [timer.py:197:stop] 0/4526, RunningAvgSamplesPerSec=6.33061151512052, CurrSamplesPerSec=5.720672585516275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:14,252] [INFO] [timer.py:197:stop] 0/4528, RunningAvgSamplesPerSec=6.330615066568691, CurrSamplesPerSec=5.714858803511764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:25,506] [INFO] [timer.py:197:stop] 0/4530, RunningAvgSamplesPerSec=6.330622922685323, CurrSamplesPerSec=5.71164496542443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:36,782] [INFO] [timer.py:197:stop] 0/4532, RunningAvgSamplesPerSec=6.330625751387737, CurrSamplesPerSec=5.687373812170686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:48,077] [INFO] [timer.py:197:stop] 0/4534, RunningAvgSamplesPerSec=6.3306346663352, CurrSamplesPerSec=5.712731891063314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:59,367] [INFO] [timer.py:197:stop] 0/4536, RunningAvgSamplesPerSec=6.330640941125124, CurrSamplesPerSec=5.700456536914458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:10,640] [INFO] [timer.py:197:stop] 0/4538, RunningAvgSamplesPerSec=6.330652404193294, CurrSamplesPerSec=5.715381530744073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:21,900] [INFO] [logging.py:68:log_dist] [Rank 0] step=2270, skipped=5, lr=[6.08e-06], mom=[[0.9, 0.999]] [2022-12-17 04:06:21,902] [INFO] [timer.py:197:stop] 0/4540, RunningAvgSamplesPerSec=6.330666901885576, CurrSamplesPerSec=5.737028654425584, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:33,224] [INFO] [timer.py:197:stop] 0/4542, RunningAvgSamplesPerSec=6.330665209915535, CurrSamplesPerSec=5.682952526304001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:44,491] [INFO] [timer.py:197:stop] 0/4544, RunningAvgSamplesPerSec=6.330674842163236, CurrSamplesPerSec=5.707736329246788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:55,804] [INFO] [timer.py:197:stop] 0/4546, RunningAvgSamplesPerSec=6.330674269839647, CurrSamplesPerSec=5.701357321521387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:07,141] [INFO] [timer.py:197:stop] 0/4548, RunningAvgSamplesPerSec=6.330686947984364, CurrSamplesPerSec=5.721433919111742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:18,412] [INFO] [timer.py:197:stop] 0/4550, RunningAvgSamplesPerSec=6.330698464805578, CurrSamplesPerSec=5.721675871507792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0036, 'learning_rate': 6.06888888888889e-06, 'epoch': 9.64} [2022-12-17 04:07:29,707] [INFO] [timer.py:197:stop] 0/4552, RunningAvgSamplesPerSec=6.330710496101829, CurrSamplesPerSec=5.720422916124233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:41,068] [INFO] [timer.py:197:stop] 0/4554, RunningAvgSamplesPerSec=6.3307208209518695, CurrSamplesPerSec=5.725109311897351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:52,440] [INFO] [timer.py:197:stop] 0/4556, RunningAvgSamplesPerSec=6.330723336313207, CurrSamplesPerSec=5.692474529011012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:03,875] [INFO] [timer.py:197:stop] 0/4558, RunningAvgSamplesPerSec=6.330714611360339, CurrSamplesPerSec=5.664903350183139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:15,227] [INFO] [logging.py:68:log_dist] [Rank 0] step=2280, skipped=5, lr=[6.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 04:08:15,228] [INFO] [timer.py:197:stop] 0/4560, RunningAvgSamplesPerSec=6.330726531708481, CurrSamplesPerSec=5.729612115939908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:26,573] [INFO] [timer.py:197:stop] 0/4562, RunningAvgSamplesPerSec=6.330743349268332, CurrSamplesPerSec=5.720252987606335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:37,916] [INFO] [timer.py:197:stop] 0/4564, RunningAvgSamplesPerSec=6.33075457470811, CurrSamplesPerSec=5.7212497855880375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:49,272] [INFO] [timer.py:197:stop] 0/4566, RunningAvgSamplesPerSec=6.330758780378043, CurrSamplesPerSec=5.6899461841189565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:00,622] [INFO] [timer.py:197:stop] 0/4568, RunningAvgSamplesPerSec=6.330764615495463, CurrSamplesPerSec=5.712111191518389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:11,925] [INFO] [timer.py:197:stop] 0/4570, RunningAvgSamplesPerSec=6.330760727916034, CurrSamplesPerSec=5.6708976453672735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:23,181] [INFO] [timer.py:197:stop] 0/4572, RunningAvgSamplesPerSec=6.330776742335515, CurrSamplesPerSec=5.7173092444048805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:34,546] [INFO] [timer.py:197:stop] 0/4574, RunningAvgSamplesPerSec=6.33078097114606, CurrSamplesPerSec=5.702969516215391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:45,942] [INFO] [timer.py:197:stop] 0/4576, RunningAvgSamplesPerSec=6.330784055681368, CurrSamplesPerSec=5.708197790402674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:57,327] [INFO] [timer.py:197:stop] 0/4578, RunningAvgSamplesPerSec=6.330790606004829, CurrSamplesPerSec=5.698967234714678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:08,671] [INFO] [logging.py:68:log_dist] [Rank 0] step=2290, skipped=5, lr=[6.0355555555555555e-06], mom=[[0.9, 0.999]] [2022-12-17 04:10:08,672] [INFO] [timer.py:197:stop] 0/4580, RunningAvgSamplesPerSec=6.33079115086289, CurrSamplesPerSec=5.6993892819688305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:20,063] [INFO] [timer.py:197:stop] 0/4582, RunningAvgSamplesPerSec=6.330792257325494, CurrSamplesPerSec=5.690597301088274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:31,383] [INFO] [timer.py:197:stop] 0/4584, RunningAvgSamplesPerSec=6.33079201294549, CurrSamplesPerSec=5.686121861559865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:42,919] [INFO] [timer.py:197:stop] 0/4586, RunningAvgSamplesPerSec=6.33079594497458, CurrSamplesPerSec=5.688110397827378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:54,250] [INFO] [timer.py:197:stop] 0/4588, RunningAvgSamplesPerSec=6.330806829957438, CurrSamplesPerSec=5.736196727782429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:05,530] [INFO] [timer.py:197:stop] 0/4590, RunningAvgSamplesPerSec=6.330812007815817, CurrSamplesPerSec=5.71894777590225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:16,800] [INFO] [timer.py:197:stop] 0/4592, RunningAvgSamplesPerSec=6.330825381671413, CurrSamplesPerSec=5.726484770543781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:28,344] [INFO] [timer.py:197:stop] 0/4594, RunningAvgSamplesPerSec=6.3308329510418035, CurrSamplesPerSec=5.723417459157915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:39,647] [INFO] [timer.py:197:stop] 0/4596, RunningAvgSamplesPerSec=6.330836708797295, CurrSamplesPerSec=5.699751121093182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:51,143] [INFO] [timer.py:197:stop] 0/4598, RunningAvgSamplesPerSec=6.330848828375051, CurrSamplesPerSec=5.73858281883173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:02,491] [INFO] [logging.py:68:log_dist] [Rank 0] step=2300, skipped=5, lr=[6.013333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 04:12:02,493] [INFO] [timer.py:197:stop] 0/4600, RunningAvgSamplesPerSec=6.330861170154268, CurrSamplesPerSec=5.724425614623796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0032, 'learning_rate': 6.013333333333335e-06, 'epoch': 9.75} [2022-12-17 04:12:13,898] [INFO] [timer.py:197:stop] 0/4602, RunningAvgSamplesPerSec=6.330832410767395, CurrSamplesPerSec=5.5755094488871615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:25,351] [INFO] [timer.py:197:stop] 0/4604, RunningAvgSamplesPerSec=6.3308431891690775, CurrSamplesPerSec=5.727538489991829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:36,684] [INFO] [timer.py:197:stop] 0/4606, RunningAvgSamplesPerSec=6.330837902683982, CurrSamplesPerSec=5.665191716507016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:48,317] [INFO] [timer.py:197:stop] 0/4608, RunningAvgSamplesPerSec=6.330755379814556, CurrSamplesPerSec=5.394187377355717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:59,623] [INFO] [timer.py:197:stop] 0/4610, RunningAvgSamplesPerSec=6.330756819466149, CurrSamplesPerSec=5.6961722661655605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:10,941] [INFO] [timer.py:197:stop] 0/4612, RunningAvgSamplesPerSec=6.330756151419777, CurrSamplesPerSec=5.695062150719643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:22,525] [INFO] [timer.py:197:stop] 0/4614, RunningAvgSamplesPerSec=6.330679185520829, CurrSamplesPerSec=5.406924178710288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:33,814] [INFO] [timer.py:197:stop] 0/4616, RunningAvgSamplesPerSec=6.330686433323466, CurrSamplesPerSec=5.6993048191151345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:45,112] [INFO] [timer.py:197:stop] 0/4618, RunningAvgSamplesPerSec=6.330687694288799, CurrSamplesPerSec=5.700686306802098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:56,440] [INFO] [logging.py:68:log_dist] [Rank 0] step=2310, skipped=5, lr=[5.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 04:13:56,442] [INFO] [timer.py:197:stop] 0/4620, RunningAvgSamplesPerSec=6.3306887015343865, CurrSamplesPerSec=5.662078148631941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:07,726] [INFO] [timer.py:197:stop] 0/4622, RunningAvgSamplesPerSec=6.3306965751068045, CurrSamplesPerSec=5.7048692235839935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:19,039] [INFO] [timer.py:197:stop] 0/4624, RunningAvgSamplesPerSec=6.33070326498081, CurrSamplesPerSec=5.682439803730237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:30,421] [INFO] [timer.py:197:stop] 0/4626, RunningAvgSamplesPerSec=6.330692695475473, CurrSamplesPerSec=5.638133414390749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:41,715] [INFO] [timer.py:197:stop] 0/4628, RunningAvgSamplesPerSec=6.330699691584441, CurrSamplesPerSec=5.717189667889637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:53,008] [INFO] [timer.py:197:stop] 0/4630, RunningAvgSamplesPerSec=6.330705696490953, CurrSamplesPerSec=5.695336920186152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:04,352] [INFO] [timer.py:197:stop] 0/4632, RunningAvgSamplesPerSec=6.330698305517842, CurrSamplesPerSec=5.6486805486610585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:15,647] [INFO] [timer.py:197:stop] 0/4634, RunningAvgSamplesPerSec=6.330704188121081, CurrSamplesPerSec=5.720073561851484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:27,219] [INFO] [timer.py:197:stop] 0/4636, RunningAvgSamplesPerSec=6.330706098111885, CurrSamplesPerSec=5.6777554829107535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:38,794] [INFO] [timer.py:197:stop] 0/4638, RunningAvgSamplesPerSec=6.3307191696421805, CurrSamplesPerSec=5.740637442241396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:50,098] [INFO] [logging.py:68:log_dist] [Rank 0] step=2320, skipped=5, lr=[5.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 04:15:50,100] [INFO] [timer.py:197:stop] 0/4640, RunningAvgSamplesPerSec=6.330714300595713, CurrSamplesPerSec=5.690648692358206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:01,486] [INFO] [timer.py:197:stop] 0/4642, RunningAvgSamplesPerSec=6.3306947193985605, CurrSamplesPerSec=5.607131295528686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:12,790] [INFO] [timer.py:197:stop] 0/4644, RunningAvgSamplesPerSec=6.3306961430292, CurrSamplesPerSec=5.696737762353815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:24,098] [INFO] [timer.py:197:stop] 0/4646, RunningAvgSamplesPerSec=6.330697606070938, CurrSamplesPerSec=5.699602265322378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:35,405] [INFO] [timer.py:197:stop] 0/4648, RunningAvgSamplesPerSec=6.330698876455062, CurrSamplesPerSec=5.714086811994762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:46,764] [INFO] [timer.py:197:stop] 0/4650, RunningAvgSamplesPerSec=6.330690469186335, CurrSamplesPerSec=5.701085118955992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0039, 'learning_rate': 5.957777777777778e-06, 'epoch': 9.85} [2022-12-17 04:16:58,107] [INFO] [timer.py:197:stop] 0/4652, RunningAvgSamplesPerSec=6.3306828500294925, CurrSamplesPerSec=5.6933512985319545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:09,667] [INFO] [timer.py:197:stop] 0/4654, RunningAvgSamplesPerSec=6.330616710020938, CurrSamplesPerSec=5.467337482556833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:21,103] [INFO] [timer.py:197:stop] 0/4656, RunningAvgSamplesPerSec=6.330615721908283, CurrSamplesPerSec=5.708993203154572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:32,573] [INFO] [timer.py:197:stop] 0/4658, RunningAvgSamplesPerSec=6.33060873986457, CurrSamplesPerSec=5.676477032949712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:44,017] [INFO] [logging.py:68:log_dist] [Rank 0] step=2330, skipped=5, lr=[5.946666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 04:17:44,034] [INFO] [timer.py:197:stop] 0/4660, RunningAvgSamplesPerSec=6.330570240212617, CurrSamplesPerSec=5.560654459755998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:55,547] [INFO] [timer.py:197:stop] 0/4662, RunningAvgSamplesPerSec=6.33057605313111, CurrSamplesPerSec=5.711914774167443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:07,012] [INFO] [timer.py:197:stop] 0/4664, RunningAvgSamplesPerSec=6.330578144373511, CurrSamplesPerSec=5.7113858757955525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:18,541] [INFO] [timer.py:197:stop] 0/4666, RunningAvgSamplesPerSec=6.330524070412179, CurrSamplesPerSec=5.487751174558908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:30,062] [INFO] [timer.py:197:stop] 0/4668, RunningAvgSamplesPerSec=6.33053034689637, CurrSamplesPerSec=5.705891958199446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:41,533] [INFO] [timer.py:197:stop] 0/4670, RunningAvgSamplesPerSec=6.330536314630159, CurrSamplesPerSec=5.701401641659355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:52,850] [INFO] [timer.py:197:stop] 0/4672, RunningAvgSamplesPerSec=6.330536460115227, CurrSamplesPerSec=5.671820509831005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:04,157] [INFO] [timer.py:197:stop] 0/4674, RunningAvgSamplesPerSec=6.330554419779344, CurrSamplesPerSec=5.738497926375647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:15,446] [INFO] [timer.py:197:stop] 0/4676, RunningAvgSamplesPerSec=6.330560749126159, CurrSamplesPerSec=5.716416559625785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:26,970] [INFO] [timer.py:197:stop] 0/4678, RunningAvgSamplesPerSec=6.330504478971126, CurrSamplesPerSec=5.483726107472392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:38,276] [INFO] [logging.py:68:log_dist] [Rank 0] step=2340, skipped=5, lr=[5.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:19:38,277] [INFO] [timer.py:197:stop] 0/4680, RunningAvgSamplesPerSec=6.3305070302567445, CurrSamplesPerSec=5.6899080722178645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:49,582] [INFO] [timer.py:197:stop] 0/4682, RunningAvgSamplesPerSec=6.330510689869193, CurrSamplesPerSec=5.6991613106084875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:01,233] [INFO] [timer.py:197:stop] 0/4684, RunningAvgSamplesPerSec=6.330422201677251, CurrSamplesPerSec=5.354158658429851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:12,510] [INFO] [timer.py:197:stop] 0/4686, RunningAvgSamplesPerSec=6.330433264181468, CurrSamplesPerSec=5.7141668480667915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:23,801] [INFO] [timer.py:197:stop] 0/4688, RunningAvgSamplesPerSec=6.330442375056253, CurrSamplesPerSec=5.719161005075831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:35,398] [INFO] [timer.py:197:stop] 0/4690, RunningAvgSamplesPerSec=6.330370208120584, CurrSamplesPerSec=5.415934115753085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:46,670] [INFO] [timer.py:197:stop] 0/4692, RunningAvgSamplesPerSec=6.330381828816718, CurrSamplesPerSec=5.713629262790996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:58,159] [INFO] [timer.py:197:stop] 0/4694, RunningAvgSamplesPerSec=6.330391647125279, CurrSamplesPerSec=5.715702320400646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:09,552] [INFO] [timer.py:197:stop] 0/4696, RunningAvgSamplesPerSec=6.330390667727303, CurrSamplesPerSec=5.727730361440755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:20,834] [INFO] [timer.py:197:stop] 0/4698, RunningAvgSamplesPerSec=6.33039563285138, CurrSamplesPerSec=5.694713229294793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:32,212] [INFO] [logging.py:68:log_dist] [Rank 0] step=2350, skipped=5, lr=[5.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 04:21:32,214] [INFO] [timer.py:197:stop] 0/4700, RunningAvgSamplesPerSec=6.330408088819912, CurrSamplesPerSec=5.717672875415328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0035, 'learning_rate': 5.902222222222223e-06, 'epoch': 9.96} [2022-12-17 04:21:43,841] [INFO] [timer.py:197:stop] 0/4702, RunningAvgSamplesPerSec=6.330403336990846, CurrSamplesPerSec=5.721540014669421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:55,098] [INFO] [timer.py:197:stop] 0/4704, RunningAvgSamplesPerSec=6.330418358984936, CurrSamplesPerSec=5.722686341466305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:06,354] [INFO] [timer.py:197:stop] 0/4706, RunningAvgSamplesPerSec=6.330429863484377, CurrSamplesPerSec=5.722186428937103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:17,857] [INFO] [timer.py:197:stop] 0/4708, RunningAvgSamplesPerSec=6.3303796798625305, CurrSamplesPerSec=5.741505533618278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:29,135] [INFO] [timer.py:197:stop] 0/4710, RunningAvgSamplesPerSec=6.330390373733256, CurrSamplesPerSec=5.729053768988964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:40,498] [INFO] [timer.py:197:stop] 0/4712, RunningAvgSamplesPerSec=6.330378569329705, CurrSamplesPerSec=5.644133107749217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:51,988] [INFO] [timer.py:197:stop] 0/4714, RunningAvgSamplesPerSec=6.330381143253887, CurrSamplesPerSec=5.696042210670469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:03,456] [INFO] [timer.py:197:stop] 0/4716, RunningAvgSamplesPerSec=6.330393422869406, CurrSamplesPerSec=5.716793469438937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:15,099] [INFO] [timer.py:197:stop] 0/4718, RunningAvgSamplesPerSec=6.33030756039602, CurrSamplesPerSec=5.366892319611965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:23,834] [INFO] [logging.py:68:log_dist] [Rank 0] step=2360, skipped=5, lr=[5.8800000000000005e-06], mom=[[0.9, 0.999]] [2022-12-17 04:23:23,835] [INFO] [timer.py:197:stop] 0/4720, RunningAvgSamplesPerSec=6.330972527543288, CurrSamplesPerSec=10.234348433383115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:35,092] [INFO] [timer.py:197:stop] 0/4722, RunningAvgSamplesPerSec=6.330985658534921, CurrSamplesPerSec=5.727015247337471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:46,448] [INFO] [timer.py:197:stop] 0/4724, RunningAvgSamplesPerSec=6.330978732945515, CurrSamplesPerSec=5.661985472808679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:58,056] [INFO] [timer.py:197:stop] 0/4726, RunningAvgSamplesPerSec=6.3309908988236305, CurrSamplesPerSec=5.7250834260977195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:09,589] [INFO] [timer.py:197:stop] 0/4728, RunningAvgSamplesPerSec=6.331000839362033, CurrSamplesPerSec=5.721300756463225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:20,996] [INFO] [timer.py:197:stop] 0/4730, RunningAvgSamplesPerSec=6.3309785423666876, CurrSamplesPerSec=5.590156459974123, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:32,356] [INFO] [timer.py:197:stop] 0/4732, RunningAvgSamplesPerSec=6.330984497430182, CurrSamplesPerSec=5.707438276169677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:43,838] [INFO] [timer.py:197:stop] 0/4734, RunningAvgSamplesPerSec=6.33099808507498, CurrSamplesPerSec=5.725624878794464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:55,147] [INFO] [timer.py:197:stop] 0/4736, RunningAvgSamplesPerSec=6.330995778328434, CurrSamplesPerSec=5.663898840429798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:06,588] [INFO] [timer.py:197:stop] 0/4738, RunningAvgSamplesPerSec=6.331004069060532, CurrSamplesPerSec=5.698754540780964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:18,033] [INFO] [logging.py:68:log_dist] [Rank 0] step=2370, skipped=5, lr=[5.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 04:25:18,035] [INFO] [timer.py:197:stop] 0/4740, RunningAvgSamplesPerSec=6.331010722188499, CurrSamplesPerSec=5.697760243937208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:29,349] [INFO] [timer.py:197:stop] 0/4742, RunningAvgSamplesPerSec=6.331010941877606, CurrSamplesPerSec=5.6695437279093355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:40,631] [INFO] [timer.py:197:stop] 0/4744, RunningAvgSamplesPerSec=6.331019301537191, CurrSamplesPerSec=5.703297153907961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:51,912] [INFO] [timer.py:197:stop] 0/4746, RunningAvgSamplesPerSec=6.331027374547529, CurrSamplesPerSec=5.724606778497856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:03,303] [INFO] [timer.py:197:stop] 0/4748, RunningAvgSamplesPerSec=6.331006687700225, CurrSamplesPerSec=5.595459055112661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:14,592] [INFO] [timer.py:197:stop] 0/4750, RunningAvgSamplesPerSec=6.331013643470817, CurrSamplesPerSec=5.714340551085141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.003, 'learning_rate': 5.846666666666667e-06, 'epoch': 10.06} [2022-12-17 04:26:25,876] [INFO] [timer.py:197:stop] 0/4752, RunningAvgSamplesPerSec=6.3310215455665695, CurrSamplesPerSec=5.689917238324662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:37,436] [INFO] [timer.py:197:stop] 0/4754, RunningAvgSamplesPerSec=6.330957040473491, CurrSamplesPerSec=5.434912660706087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:48,726] [INFO] [timer.py:197:stop] 0/4756, RunningAvgSamplesPerSec=6.330964057867343, CurrSamplesPerSec=5.710236783132878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:00,107] [INFO] [timer.py:197:stop] 0/4758, RunningAvgSamplesPerSec=6.3309677032167375, CurrSamplesPerSec=5.715751001795795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:11,573] [INFO] [logging.py:68:log_dist] [Rank 0] step=2380, skipped=5, lr=[5.8355555555555565e-06], mom=[[0.9, 0.999]] [2022-12-17 04:27:11,575] [INFO] [timer.py:197:stop] 0/4760, RunningAvgSamplesPerSec=6.330973467975903, CurrSamplesPerSec=5.680981781017513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:22,902] [INFO] [timer.py:197:stop] 0/4762, RunningAvgSamplesPerSec=6.3309707590843685, CurrSamplesPerSec=5.6827352515239316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:34,292] [INFO] [timer.py:197:stop] 0/4764, RunningAvgSamplesPerSec=6.330987778779006, CurrSamplesPerSec=5.753313279983759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:45,702] [INFO] [timer.py:197:stop] 0/4766, RunningAvgSamplesPerSec=6.33099289520879, CurrSamplesPerSec=5.73214793479881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:57,013] [INFO] [timer.py:197:stop] 0/4768, RunningAvgSamplesPerSec=6.33099395228108, CurrSamplesPerSec=5.686713794908089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:08,481] [INFO] [timer.py:197:stop] 0/4770, RunningAvgSamplesPerSec=6.3310024447366215, CurrSamplesPerSec=5.73548121382306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:19,983] [INFO] [timer.py:197:stop] 0/4772, RunningAvgSamplesPerSec=6.330988706634111, CurrSamplesPerSec=5.693727588150975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:31,256] [INFO] [timer.py:197:stop] 0/4774, RunningAvgSamplesPerSec=6.330996263719724, CurrSamplesPerSec=5.714685798939968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:42,516] [INFO] [timer.py:197:stop] 0/4776, RunningAvgSamplesPerSec=6.331011640842076, CurrSamplesPerSec=5.7397435947875985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:53,854] [INFO] [timer.py:197:stop] 0/4778, RunningAvgSamplesPerSec=6.331007745299911, CurrSamplesPerSec=5.724216387060134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:05,156] [INFO] [logging.py:68:log_dist] [Rank 0] step=2390, skipped=5, lr=[5.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 04:29:05,158] [INFO] [timer.py:197:stop] 0/4780, RunningAvgSamplesPerSec=6.331010742488278, CurrSamplesPerSec=5.699064996979088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:16,468] [INFO] [timer.py:197:stop] 0/4782, RunningAvgSamplesPerSec=6.331012239137666, CurrSamplesPerSec=5.716506643368661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:27,729] [INFO] [timer.py:197:stop] 0/4784, RunningAvgSamplesPerSec=6.331018580974164, CurrSamplesPerSec=5.714587743334454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:39,013] [INFO] [timer.py:197:stop] 0/4786, RunningAvgSamplesPerSec=6.331022981960946, CurrSamplesPerSec=5.702331554684467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:50,326] [INFO] [timer.py:197:stop] 0/4788, RunningAvgSamplesPerSec=6.331019901591877, CurrSamplesPerSec=5.690069448256989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:01,604] [INFO] [timer.py:197:stop] 0/4790, RunningAvgSamplesPerSec=6.331029794130788, CurrSamplesPerSec=5.716243947629426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:12,897] [INFO] [timer.py:197:stop] 0/4792, RunningAvgSamplesPerSec=6.331036251624487, CurrSamplesPerSec=5.719949238107081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:24,158] [INFO] [timer.py:197:stop] 0/4794, RunningAvgSamplesPerSec=6.331043100155323, CurrSamplesPerSec=5.702531432327052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:35,431] [INFO] [timer.py:197:stop] 0/4796, RunningAvgSamplesPerSec=6.331050018922631, CurrSamplesPerSec=5.6975415935642415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:46,826] [INFO] [timer.py:197:stop] 0/4798, RunningAvgSamplesPerSec=6.33102938541137, CurrSamplesPerSec=5.688286859627258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:58,154] [INFO] [logging.py:68:log_dist] [Rank 0] step=2400, skipped=5, lr=[5.791111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 04:30:58,156] [INFO] [timer.py:197:stop] 0/4800, RunningAvgSamplesPerSec=6.331025671102678, CurrSamplesPerSec=5.684242320700877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0021, 'learning_rate': 5.791111111111112e-06, 'epoch': 10.17} [2022-12-17 04:31:09,453] [INFO] [timer.py:197:stop] 0/4802, RunningAvgSamplesPerSec=6.331027120213348, CurrSamplesPerSec=5.682597147509798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:20,797] [INFO] [timer.py:197:stop] 0/4804, RunningAvgSamplesPerSec=6.331016437845667, CurrSamplesPerSec=5.691636164035205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:32,093] [INFO] [timer.py:197:stop] 0/4806, RunningAvgSamplesPerSec=6.331014511201373, CurrSamplesPerSec=5.683533932334821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:43,425] [INFO] [timer.py:197:stop] 0/4808, RunningAvgSamplesPerSec=6.331017018483959, CurrSamplesPerSec=5.686735720732231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:55,131] [INFO] [timer.py:197:stop] 0/4810, RunningAvgSamplesPerSec=6.331027561745868, CurrSamplesPerSec=5.721124191374545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:06,778] [INFO] [timer.py:197:stop] 0/4812, RunningAvgSamplesPerSec=6.331032378578249, CurrSamplesPerSec=5.701915854066887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:18,288] [INFO] [timer.py:197:stop] 0/4814, RunningAvgSamplesPerSec=6.330999850169187, CurrSamplesPerSec=5.556238848627807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:29,583] [INFO] [timer.py:197:stop] 0/4816, RunningAvgSamplesPerSec=6.331008162515726, CurrSamplesPerSec=5.728340769234183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:40,915] [INFO] [timer.py:197:stop] 0/4818, RunningAvgSamplesPerSec=6.331002794289343, CurrSamplesPerSec=5.681108264065701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:52,634] [INFO] [logging.py:68:log_dist] [Rank 0] step=2410, skipped=5, lr=[5.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 04:32:52,636] [INFO] [timer.py:197:stop] 0/4820, RunningAvgSamplesPerSec=6.330896323352138, CurrSamplesPerSec=5.290471094035281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:03,977] [INFO] [timer.py:197:stop] 0/4822, RunningAvgSamplesPerSec=6.330893064737957, CurrSamplesPerSec=5.677873415545301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:15,309] [INFO] [timer.py:197:stop] 0/4824, RunningAvgSamplesPerSec=6.330895551737563, CurrSamplesPerSec=5.6946078846918695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:26,713] [INFO] [timer.py:197:stop] 0/4826, RunningAvgSamplesPerSec=6.330872884371823, CurrSamplesPerSec=5.635442266683619, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:37,998] [INFO] [timer.py:197:stop] 0/4828, RunningAvgSamplesPerSec=6.33088129819278, CurrSamplesPerSec=5.694139921496582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:49,282] [INFO] [timer.py:197:stop] 0/4830, RunningAvgSamplesPerSec=6.330889321460371, CurrSamplesPerSec=5.706906323939215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:00,782] [INFO] [timer.py:197:stop] 0/4832, RunningAvgSamplesPerSec=6.330877685241006, CurrSamplesPerSec=5.637208456846233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:12,038] [INFO] [timer.py:197:stop] 0/4834, RunningAvgSamplesPerSec=6.330888486726185, CurrSamplesPerSec=5.713096642313446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:23,435] [INFO] [timer.py:197:stop] 0/4836, RunningAvgSamplesPerSec=6.330895314332208, CurrSamplesPerSec=5.717684566931951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:34,736] [INFO] [timer.py:197:stop] 0/4838, RunningAvgSamplesPerSec=6.330900525193966, CurrSamplesPerSec=5.68793563429523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:46,005] [INFO] [logging.py:68:log_dist] [Rank 0] step=2420, skipped=5, lr=[5.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 04:34:46,007] [INFO] [timer.py:197:stop] 0/4840, RunningAvgSamplesPerSec=6.33091151618933, CurrSamplesPerSec=5.7051672509014955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:57,541] [INFO] [timer.py:197:stop] 0/4842, RunningAvgSamplesPerSec=6.330914801125688, CurrSamplesPerSec=5.700826744304333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:09,022] [INFO] [timer.py:197:stop] 0/4844, RunningAvgSamplesPerSec=6.3309054184735265, CurrSamplesPerSec=5.671777607111656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:20,280] [INFO] [timer.py:197:stop] 0/4846, RunningAvgSamplesPerSec=6.330915487625754, CurrSamplesPerSec=5.706281552323623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:31,871] [INFO] [timer.py:197:stop] 0/4848, RunningAvgSamplesPerSec=6.330914274265737, CurrSamplesPerSec=5.6780453994108475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:43,422] [INFO] [timer.py:197:stop] 0/4850, RunningAvgSamplesPerSec=6.33090621324211, CurrSamplesPerSec=5.731395739761191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0021, 'learning_rate': 5.735555555555557e-06, 'epoch': 10.28} [2022-12-17 04:35:54,655] [INFO] [timer.py:197:stop] 0/4852, RunningAvgSamplesPerSec=6.33092348176634, CurrSamplesPerSec=5.737195166807862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:05,945] [INFO] [timer.py:197:stop] 0/4854, RunningAvgSamplesPerSec=6.33093041408174, CurrSamplesPerSec=5.719320876462623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:17,387] [INFO] [timer.py:197:stop] 0/4856, RunningAvgSamplesPerSec=6.330920262213994, CurrSamplesPerSec=5.719814194516181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:28,701] [INFO] [timer.py:197:stop] 0/4858, RunningAvgSamplesPerSec=6.330922055789028, CurrSamplesPerSec=5.686120416208014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:40,001] [INFO] [logging.py:68:log_dist] [Rank 0] step=2430, skipped=5, lr=[5.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:36:40,004] [INFO] [timer.py:197:stop] 0/4860, RunningAvgSamplesPerSec=6.330924720708107, CurrSamplesPerSec=5.714667550123298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:51,672] [INFO] [timer.py:197:stop] 0/4862, RunningAvgSamplesPerSec=6.3308836746394785, CurrSamplesPerSec=5.716470122590163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:02,940] [INFO] [timer.py:197:stop] 0/4864, RunningAvgSamplesPerSec=6.330895535122308, CurrSamplesPerSec=5.733101619045565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:14,232] [INFO] [timer.py:197:stop] 0/4866, RunningAvgSamplesPerSec=6.330901820379269, CurrSamplesPerSec=5.705878131739231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:25,508] [INFO] [timer.py:197:stop] 0/4868, RunningAvgSamplesPerSec=6.33090794129048, CurrSamplesPerSec=5.7053280385360585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:36,804] [INFO] [timer.py:197:stop] 0/4870, RunningAvgSamplesPerSec=6.330909647225291, CurrSamplesPerSec=5.696581568772256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:48,184] [INFO] [timer.py:197:stop] 0/4872, RunningAvgSamplesPerSec=6.330889102496775, CurrSamplesPerSec=5.614864792503346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:59,850] [INFO] [timer.py:197:stop] 0/4874, RunningAvgSamplesPerSec=6.330887659503165, CurrSamplesPerSec=5.7011781105036325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:11,423] [INFO] [timer.py:197:stop] 0/4876, RunningAvgSamplesPerSec=6.330893858924424, CurrSamplesPerSec=5.698189611806278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:22,772] [INFO] [timer.py:197:stop] 0/4878, RunningAvgSamplesPerSec=6.330885528518763, CurrSamplesPerSec=5.673804330756011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:34,301] [INFO] [logging.py:68:log_dist] [Rank 0] step=2440, skipped=5, lr=[5.702222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 04:38:34,303] [INFO] [timer.py:197:stop] 0/4880, RunningAvgSamplesPerSec=6.330893383464845, CurrSamplesPerSec=5.717082028950172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:45,586] [INFO] [timer.py:197:stop] 0/4882, RunningAvgSamplesPerSec=6.330902580256411, CurrSamplesPerSec=5.728562523461276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:56,843] [INFO] [timer.py:197:stop] 0/4884, RunningAvgSamplesPerSec=6.330914329730515, CurrSamplesPerSec=5.71846191647385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:08,130] [INFO] [timer.py:197:stop] 0/4886, RunningAvgSamplesPerSec=6.330922027749416, CurrSamplesPerSec=5.739567116761201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:19,403] [INFO] [timer.py:197:stop] 0/4888, RunningAvgSamplesPerSec=6.330933121823192, CurrSamplesPerSec=5.723764781098753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:31,079] [INFO] [timer.py:197:stop] 0/4890, RunningAvgSamplesPerSec=6.330841332485585, CurrSamplesPerSec=5.349498429200928, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:42,308] [INFO] [timer.py:197:stop] 0/4892, RunningAvgSamplesPerSec=6.330855947690373, CurrSamplesPerSec=5.728922696735184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:53,592] [INFO] [timer.py:197:stop] 0/4894, RunningAvgSamplesPerSec=6.33086065662833, CurrSamplesPerSec=5.6968968661698085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:04,953] [INFO] [timer.py:197:stop] 0/4896, RunningAvgSamplesPerSec=6.330849295325178, CurrSamplesPerSec=5.6243742767222225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:16,256] [INFO] [timer.py:197:stop] 0/4898, RunningAvgSamplesPerSec=6.33085324011296, CurrSamplesPerSec=5.696648542370528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:27,669] [INFO] [logging.py:68:log_dist] [Rank 0] step=2450, skipped=5, lr=[5.68e-06], mom=[[0.9, 0.999]] [2022-12-17 04:40:27,671] [INFO] [timer.py:197:stop] 0/4900, RunningAvgSamplesPerSec=6.330869663344087, CurrSamplesPerSec=5.736259242619751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.002, 'learning_rate': 5.68e-06, 'epoch': 10.38} [2022-12-17 04:40:39,234] [INFO] [timer.py:197:stop] 0/4902, RunningAvgSamplesPerSec=6.330875795922531, CurrSamplesPerSec=5.697687681112232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:50,517] [INFO] [timer.py:197:stop] 0/4904, RunningAvgSamplesPerSec=6.330884282678793, CurrSamplesPerSec=5.71153437551238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:01,993] [INFO] [timer.py:197:stop] 0/4906, RunningAvgSamplesPerSec=6.330895576355606, CurrSamplesPerSec=5.721800514246041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:13,570] [INFO] [timer.py:197:stop] 0/4908, RunningAvgSamplesPerSec=6.330898229565119, CurrSamplesPerSec=5.7448197088231945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:24,882] [INFO] [timer.py:197:stop] 0/4910, RunningAvgSamplesPerSec=6.330898886215009, CurrSamplesPerSec=5.6970586388875954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:36,127] [INFO] [timer.py:197:stop] 0/4912, RunningAvgSamplesPerSec=6.330912386969782, CurrSamplesPerSec=5.699116541305495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:47,493] [INFO] [timer.py:197:stop] 0/4914, RunningAvgSamplesPerSec=6.330903721488429, CurrSamplesPerSec=5.698654369615493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:58,764] [INFO] [timer.py:197:stop] 0/4916, RunningAvgSamplesPerSec=6.330914261240133, CurrSamplesPerSec=5.732406952943924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:09,996] [INFO] [timer.py:197:stop] 0/4918, RunningAvgSamplesPerSec=6.330934678578837, CurrSamplesPerSec=5.731016657378124, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:21,272] [INFO] [logging.py:68:log_dist] [Rank 0] step=2460, skipped=5, lr=[5.657777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 04:42:21,274] [INFO] [timer.py:197:stop] 0/4920, RunningAvgSamplesPerSec=6.330939690491742, CurrSamplesPerSec=5.706582638769841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:32,531] [INFO] [timer.py:197:stop] 0/4922, RunningAvgSamplesPerSec=6.330953956718717, CurrSamplesPerSec=5.741506024833396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:43,762] [INFO] [timer.py:197:stop] 0/4924, RunningAvgSamplesPerSec=6.330971782272438, CurrSamplesPerSec=5.735810391804533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:55,146] [INFO] [timer.py:197:stop] 0/4926, RunningAvgSamplesPerSec=6.330967983484556, CurrSamplesPerSec=5.715938432909474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:06,393] [INFO] [timer.py:197:stop] 0/4928, RunningAvgSamplesPerSec=6.330985542332555, CurrSamplesPerSec=5.745078397976903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:17,660] [INFO] [timer.py:197:stop] 0/4930, RunningAvgSamplesPerSec=6.330994799172411, CurrSamplesPerSec=5.723397202046335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:28,909] [INFO] [timer.py:197:stop] 0/4932, RunningAvgSamplesPerSec=6.3310084101233475, CurrSamplesPerSec=5.7396984310883274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:40,183] [INFO] [timer.py:197:stop] 0/4934, RunningAvgSamplesPerSec=6.331015231943894, CurrSamplesPerSec=5.7254964057845985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:51,455] [INFO] [timer.py:197:stop] 0/4936, RunningAvgSamplesPerSec=6.3310223319318535, CurrSamplesPerSec=5.731285852101456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:02,808] [INFO] [timer.py:197:stop] 0/4938, RunningAvgSamplesPerSec=6.331024615497289, CurrSamplesPerSec=5.694806496347404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:14,054] [INFO] [logging.py:68:log_dist] [Rank 0] step=2470, skipped=5, lr=[5.635555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 04:44:14,056] [INFO] [timer.py:197:stop] 0/4940, RunningAvgSamplesPerSec=6.331037883835673, CurrSamplesPerSec=5.726190863716782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:25,339] [INFO] [timer.py:197:stop] 0/4942, RunningAvgSamplesPerSec=6.331046275843268, CurrSamplesPerSec=5.72533814277841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:36,837] [INFO] [timer.py:197:stop] 0/4944, RunningAvgSamplesPerSec=6.330999783404063, CurrSamplesPerSec=5.714860750176948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:48,130] [INFO] [timer.py:197:stop] 0/4946, RunningAvgSamplesPerSec=6.331006276714516, CurrSamplesPerSec=5.693992083341906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:59,676] [INFO] [timer.py:197:stop] 0/4948, RunningAvgSamplesPerSec=6.330950637601543, CurrSamplesPerSec=5.472202700799134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:11,183] [INFO] [timer.py:197:stop] 0/4950, RunningAvgSamplesPerSec=6.330955165485506, CurrSamplesPerSec=5.697722027285682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0024, 'learning_rate': 5.624444444444445e-06, 'epoch': 10.49} [2022-12-17 04:45:22,615] [INFO] [timer.py:197:stop] 0/4952, RunningAvgSamplesPerSec=6.330968195653805, CurrSamplesPerSec=5.728223420030052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:33,954] [INFO] [timer.py:197:stop] 0/4954, RunningAvgSamplesPerSec=6.330962540471415, CurrSamplesPerSec=5.636947553409113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:45,419] [INFO] [timer.py:197:stop] 0/4956, RunningAvgSamplesPerSec=6.33096635642239, CurrSamplesPerSec=5.693265324059259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:56,717] [INFO] [timer.py:197:stop] 0/4958, RunningAvgSamplesPerSec=6.330967716644158, CurrSamplesPerSec=5.69654095017297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:07,973] [INFO] [logging.py:68:log_dist] [Rank 0] step=2480, skipped=5, lr=[5.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 04:46:07,974] [INFO] [timer.py:197:stop] 0/4960, RunningAvgSamplesPerSec=6.330982785160217, CurrSamplesPerSec=5.729682559028319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:19,241] [INFO] [timer.py:197:stop] 0/4962, RunningAvgSamplesPerSec=6.330991543986501, CurrSamplesPerSec=5.71776031937525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:30,562] [INFO] [timer.py:197:stop] 0/4964, RunningAvgSamplesPerSec=6.330986218687438, CurrSamplesPerSec=5.683666064780153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:41,955] [INFO] [timer.py:197:stop] 0/4966, RunningAvgSamplesPerSec=6.330967592057502, CurrSamplesPerSec=5.610417109930005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:53,258] [INFO] [timer.py:197:stop] 0/4968, RunningAvgSamplesPerSec=6.330971016468183, CurrSamplesPerSec=5.715656560645305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:04,541] [INFO] [timer.py:197:stop] 0/4970, RunningAvgSamplesPerSec=6.3309794255496845, CurrSamplesPerSec=5.734612982910162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:16,033] [INFO] [timer.py:197:stop] 0/4972, RunningAvgSamplesPerSec=6.330936383650636, CurrSamplesPerSec=5.529107192673739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:27,313] [INFO] [timer.py:197:stop] 0/4974, RunningAvgSamplesPerSec=6.330946328971878, CurrSamplesPerSec=5.729163081966281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:38,607] [INFO] [timer.py:197:stop] 0/4976, RunningAvgSamplesPerSec=6.330953184900717, CurrSamplesPerSec=5.726244121402355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:50,207] [INFO] [timer.py:197:stop] 0/4978, RunningAvgSamplesPerSec=6.330953538099298, CurrSamplesPerSec=5.68242609069436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:01,496] [INFO] [logging.py:68:log_dist] [Rank 0] step=2490, skipped=5, lr=[5.591111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 04:48:01,497] [INFO] [timer.py:197:stop] 0/4980, RunningAvgSamplesPerSec=6.33096013014356, CurrSamplesPerSec=5.694920305619834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:12,985] [INFO] [timer.py:197:stop] 0/4982, RunningAvgSamplesPerSec=6.330969576658594, CurrSamplesPerSec=5.704752349116692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:24,480] [INFO] [timer.py:197:stop] 0/4984, RunningAvgSamplesPerSec=6.330971504946974, CurrSamplesPerSec=5.693244555334464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:35,802] [INFO] [timer.py:197:stop] 0/4986, RunningAvgSamplesPerSec=6.33097084853042, CurrSamplesPerSec=5.688880208346578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:47,248] [INFO] [timer.py:197:stop] 0/4988, RunningAvgSamplesPerSec=6.3309845144832435, CurrSamplesPerSec=5.742247609891225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:58,752] [INFO] [timer.py:197:stop] 0/4990, RunningAvgSamplesPerSec=6.330976160204492, CurrSamplesPerSec=5.674807318807295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:10,042] [INFO] [timer.py:197:stop] 0/4992, RunningAvgSamplesPerSec=6.3309835858007135, CurrSamplesPerSec=5.713671584841052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:21,342] [INFO] [timer.py:197:stop] 0/4994, RunningAvgSamplesPerSec=6.330989254123995, CurrSamplesPerSec=5.733567190966732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:32,658] [INFO] [timer.py:197:stop] 0/4996, RunningAvgSamplesPerSec=6.330985781418266, CurrSamplesPerSec=5.712304461873893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:43,973] [INFO] [timer.py:197:stop] 0/4998, RunningAvgSamplesPerSec=6.330986864519867, CurrSamplesPerSec=5.688001440544016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:55,252] [INFO] [logging.py:68:log_dist] [Rank 0] step=2500, skipped=5, lr=[5.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 04:49:55,254] [INFO] [timer.py:197:stop] 0/5000, RunningAvgSamplesPerSec=6.3309958963633886, CurrSamplesPerSec=5.718180282509813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0025, 'learning_rate': 5.56888888888889e-06, 'epoch': 10.59} [2022-12-17 04:50:06,731] [INFO] [timer.py:197:stop] 0/5002, RunningAvgSamplesPerSec=6.3309565040950755, CurrSamplesPerSec=5.710061386113425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:18,001] [INFO] [timer.py:197:stop] 0/5004, RunningAvgSamplesPerSec=6.330958386668482, CurrSamplesPerSec=5.690579205791651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:29,295] [INFO] [timer.py:197:stop] 0/5006, RunningAvgSamplesPerSec=6.33096474195664, CurrSamplesPerSec=5.70624128047807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:40,632] [INFO] [timer.py:197:stop] 0/5008, RunningAvgSamplesPerSec=6.330959864889557, CurrSamplesPerSec=5.72188784057362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:51,923] [INFO] [timer.py:197:stop] 0/5010, RunningAvgSamplesPerSec=6.330966035087153, CurrSamplesPerSec=5.7113448027691005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:03,174] [INFO] [timer.py:197:stop] 0/5012, RunningAvgSamplesPerSec=6.330974127924187, CurrSamplesPerSec=5.714591636296829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:14,882] [INFO] [timer.py:197:stop] 0/5014, RunningAvgSamplesPerSec=6.330981396072004, CurrSamplesPerSec=5.719529258836262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:26,372] [INFO] [timer.py:197:stop] 0/5016, RunningAvgSamplesPerSec=6.330988349564209, CurrSamplesPerSec=5.709294332972175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:37,680] [INFO] [timer.py:197:stop] 0/5018, RunningAvgSamplesPerSec=6.330990138475348, CurrSamplesPerSec=5.697592868472862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:49,248] [INFO] [logging.py:68:log_dist] [Rank 0] step=2510, skipped=5, lr=[5.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 04:51:49,250] [INFO] [timer.py:197:stop] 0/5020, RunningAvgSamplesPerSec=6.330983495949797, CurrSamplesPerSec=5.667921172586138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:00,554] [INFO] [timer.py:197:stop] 0/5022, RunningAvgSamplesPerSec=6.330982696273461, CurrSamplesPerSec=5.700785823007439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:12,061] [INFO] [timer.py:197:stop] 0/5024, RunningAvgSamplesPerSec=6.330933809736621, CurrSamplesPerSec=5.477729075145169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:23,555] [INFO] [timer.py:197:stop] 0/5026, RunningAvgSamplesPerSec=6.330939921329972, CurrSamplesPerSec=5.714190932308468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:34,858] [INFO] [timer.py:197:stop] 0/5028, RunningAvgSamplesPerSec=6.330943473294494, CurrSamplesPerSec=5.708029315662384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:46,444] [INFO] [timer.py:197:stop] 0/5030, RunningAvgSamplesPerSec=6.330879968150159, CurrSamplesPerSec=5.430234106849332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:57,897] [INFO] [timer.py:197:stop] 0/5032, RunningAvgSamplesPerSec=6.330886515803439, CurrSamplesPerSec=5.708739938717838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:09,136] [INFO] [timer.py:197:stop] 0/5034, RunningAvgSamplesPerSec=6.330906438784912, CurrSamplesPerSec=5.731909991688547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:20,483] [INFO] [timer.py:197:stop] 0/5036, RunningAvgSamplesPerSec=6.33089898868953, CurrSamplesPerSec=5.645936337730132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:32,121] [INFO] [timer.py:197:stop] 0/5038, RunningAvgSamplesPerSec=6.3309053412966305, CurrSamplesPerSec=5.706116101744382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:43,376] [INFO] [logging.py:68:log_dist] [Rank 0] step=2520, skipped=5, lr=[5.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:53:43,378] [INFO] [timer.py:197:stop] 0/5040, RunningAvgSamplesPerSec=6.330912990471432, CurrSamplesPerSec=5.70689030866712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:54,672] [INFO] [timer.py:197:stop] 0/5042, RunningAvgSamplesPerSec=6.3309159228723315, CurrSamplesPerSec=5.6894517339217705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:05,967] [INFO] [timer.py:197:stop] 0/5044, RunningAvgSamplesPerSec=6.3309220000445, CurrSamplesPerSec=5.7000314267222265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:17,231] [INFO] [timer.py:197:stop] 0/5046, RunningAvgSamplesPerSec=6.330935750681338, CurrSamplesPerSec=5.738988914310895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:28,522] [INFO] [timer.py:197:stop] 0/5048, RunningAvgSamplesPerSec=6.330939183099069, CurrSamplesPerSec=5.689545311073165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:39,793] [INFO] [timer.py:197:stop] 0/5050, RunningAvgSamplesPerSec=6.330947485759522, CurrSamplesPerSec=5.712155679007506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0021, 'learning_rate': 5.513333333333334e-06, 'epoch': 10.7} [2022-12-17 04:54:51,133] [INFO] [timer.py:197:stop] 0/5052, RunningAvgSamplesPerSec=6.33095712793302, CurrSamplesPerSec=5.7344730809313536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:02,649] [INFO] [timer.py:197:stop] 0/5054, RunningAvgSamplesPerSec=6.330958595704715, CurrSamplesPerSec=5.691896602840322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:13,982] [INFO] [timer.py:197:stop] 0/5056, RunningAvgSamplesPerSec=6.330955353948697, CurrSamplesPerSec=5.671333757852513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:25,520] [INFO] [timer.py:197:stop] 0/5058, RunningAvgSamplesPerSec=6.330959551487751, CurrSamplesPerSec=5.705458275887765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:36,931] [INFO] [logging.py:68:log_dist] [Rank 0] step=2530, skipped=5, lr=[5.5022222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 04:55:36,933] [INFO] [timer.py:197:stop] 0/5060, RunningAvgSamplesPerSec=6.330948864322162, CurrSamplesPerSec=5.692745185578915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:48,218] [INFO] [timer.py:197:stop] 0/5062, RunningAvgSamplesPerSec=6.330952978062427, CurrSamplesPerSec=5.707607201453538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:59,759] [INFO] [timer.py:197:stop] 0/5064, RunningAvgSamplesPerSec=6.330951656585579, CurrSamplesPerSec=5.691243740564786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:11,357] [INFO] [timer.py:197:stop] 0/5066, RunningAvgSamplesPerSec=6.3309381307146335, CurrSamplesPerSec=5.699524088901511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:22,657] [INFO] [timer.py:197:stop] 0/5068, RunningAvgSamplesPerSec=6.33094166677426, CurrSamplesPerSec=5.704085140967916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:33,962] [INFO] [timer.py:197:stop] 0/5070, RunningAvgSamplesPerSec=6.330944843263456, CurrSamplesPerSec=5.68741405914248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:45,316] [INFO] [timer.py:197:stop] 0/5072, RunningAvgSamplesPerSec=6.330935756879842, CurrSamplesPerSec=5.718039719534393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:56,621] [INFO] [timer.py:197:stop] 0/5074, RunningAvgSamplesPerSec=6.330939117650655, CurrSamplesPerSec=5.707029838678644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:07,918] [INFO] [timer.py:197:stop] 0/5076, RunningAvgSamplesPerSec=6.330944165593699, CurrSamplesPerSec=5.705340164667865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:19,572] [INFO] [timer.py:197:stop] 0/5078, RunningAvgSamplesPerSec=6.3309475667119015, CurrSamplesPerSec=5.720259082428829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:30,872] [INFO] [logging.py:68:log_dist] [Rank 0] step=2540, skipped=5, lr=[5.480000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 04:57:30,874] [INFO] [timer.py:197:stop] 0/5080, RunningAvgSamplesPerSec=6.330950895937812, CurrSamplesPerSec=5.7041610183311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:42,225] [INFO] [timer.py:197:stop] 0/5082, RunningAvgSamplesPerSec=6.3309426889380465, CurrSamplesPerSec=5.622238335501086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:53,760] [INFO] [timer.py:197:stop] 0/5084, RunningAvgSamplesPerSec=6.330952854774201, CurrSamplesPerSec=5.729616273990773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:05,143] [INFO] [timer.py:197:stop] 0/5086, RunningAvgSamplesPerSec=6.3309542655873265, CurrSamplesPerSec=5.703234143735746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:16,445] [INFO] [timer.py:197:stop] 0/5088, RunningAvgSamplesPerSec=6.330960905784509, CurrSamplesPerSec=5.7160389692080775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:28,076] [INFO] [timer.py:197:stop] 0/5090, RunningAvgSamplesPerSec=6.330968824088808, CurrSamplesPerSec=5.706404554792997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:39,364] [INFO] [timer.py:197:stop] 0/5092, RunningAvgSamplesPerSec=6.330972199325294, CurrSamplesPerSec=5.693679764246615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:50,726] [INFO] [timer.py:197:stop] 0/5094, RunningAvgSamplesPerSec=6.330961521446245, CurrSamplesPerSec=5.635776153192244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:02,022] [INFO] [timer.py:197:stop] 0/5096, RunningAvgSamplesPerSec=6.33096897362408, CurrSamplesPerSec=5.719917061077151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:13,270] [INFO] [timer.py:197:stop] 0/5098, RunningAvgSamplesPerSec=6.330985743984386, CurrSamplesPerSec=5.742667983797375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:24,542] [INFO] [logging.py:68:log_dist] [Rank 0] step=2550, skipped=5, lr=[5.4577777777777785e-06], mom=[[0.9, 0.999]] [2022-12-17 04:59:24,543] [INFO] [timer.py:197:stop] 0/5100, RunningAvgSamplesPerSec=6.330996104084737, CurrSamplesPerSec=5.716025093527925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0018, 'learning_rate': 5.4577777777777785e-06, 'epoch': 10.81} [2022-12-17 04:59:35,910] [INFO] [timer.py:197:stop] 0/5102, RunningAvgSamplesPerSec=6.330983921264206, CurrSamplesPerSec=5.626090613198088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:47,213] [INFO] [timer.py:197:stop] 0/5104, RunningAvgSamplesPerSec=6.330988431641169, CurrSamplesPerSec=5.70301798095724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:58,451] [INFO] [timer.py:197:stop] 0/5106, RunningAvgSamplesPerSec=6.331007527523088, CurrSamplesPerSec=5.757828958869349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:09,982] [INFO] [timer.py:197:stop] 0/5108, RunningAvgSamplesPerSec=6.331019128346327, CurrSamplesPerSec=5.731908033391001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:21,267] [INFO] [timer.py:197:stop] 0/5110, RunningAvgSamplesPerSec=6.331032777766586, CurrSamplesPerSec=5.727329034819482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:32,810] [INFO] [timer.py:197:stop] 0/5112, RunningAvgSamplesPerSec=6.330977003613128, CurrSamplesPerSec=5.451910508189052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:44,368] [INFO] [timer.py:197:stop] 0/5114, RunningAvgSamplesPerSec=6.330978114522191, CurrSamplesPerSec=5.696982224919324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:55,670] [INFO] [timer.py:197:stop] 0/5116, RunningAvgSamplesPerSec=6.330987497891552, CurrSamplesPerSec=5.707285135595816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:07,354] [INFO] [timer.py:197:stop] 0/5118, RunningAvgSamplesPerSec=6.330899081644487, CurrSamplesPerSec=5.3293482591493655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:18,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=2560, skipped=5, lr=[5.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:01:18,755] [INFO] [timer.py:197:stop] 0/5120, RunningAvgSamplesPerSec=6.330901902893264, CurrSamplesPerSec=5.715834005486816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:30,224] [INFO] [timer.py:197:stop] 0/5122, RunningAvgSamplesPerSec=6.330903788047218, CurrSamplesPerSec=5.690690433368548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:41,549] [INFO] [timer.py:197:stop] 0/5124, RunningAvgSamplesPerSec=6.330899001074299, CurrSamplesPerSec=5.678515045294561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:53,137] [INFO] [timer.py:197:stop] 0/5126, RunningAvgSamplesPerSec=6.330897736958658, CurrSamplesPerSec=5.681188581320298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:04,450] [INFO] [timer.py:197:stop] 0/5128, RunningAvgSamplesPerSec=6.330906470721249, CurrSamplesPerSec=5.7209881168777335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:16,010] [INFO] [timer.py:197:stop] 0/5130, RunningAvgSamplesPerSec=6.330847665324243, CurrSamplesPerSec=5.448622762386459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:27,308] [INFO] [timer.py:197:stop] 0/5132, RunningAvgSamplesPerSec=6.330851968143579, CurrSamplesPerSec=5.695187811565611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:38,633] [INFO] [timer.py:197:stop] 0/5134, RunningAvgSamplesPerSec=6.3308504728612265, CurrSamplesPerSec=5.6930532969919625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:50,042] [INFO] [timer.py:197:stop] 0/5136, RunningAvgSamplesPerSec=6.330828568356297, CurrSamplesPerSec=5.590546942353913, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:01,351] [INFO] [timer.py:197:stop] 0/5138, RunningAvgSamplesPerSec=6.330831029018583, CurrSamplesPerSec=5.692884024759842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:12,670] [INFO] [logging.py:68:log_dist] [Rank 0] step=2570, skipped=5, lr=[5.413333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 05:03:12,671] [INFO] [timer.py:197:stop] 0/5140, RunningAvgSamplesPerSec=6.330831830621784, CurrSamplesPerSec=5.692991961810583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:24,122] [INFO] [timer.py:197:stop] 0/5142, RunningAvgSamplesPerSec=6.3307995165787485, CurrSamplesPerSec=5.557602468839612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:35,475] [INFO] [timer.py:197:stop] 0/5144, RunningAvgSamplesPerSec=6.330791148922355, CurrSamplesPerSec=5.66601202336115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:46,865] [INFO] [timer.py:197:stop] 0/5146, RunningAvgSamplesPerSec=6.330793328574028, CurrSamplesPerSec=5.694739566054518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:58,525] [INFO] [timer.py:197:stop] 0/5148, RunningAvgSamplesPerSec=6.3307985999441545, CurrSamplesPerSec=5.71201687060952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:09,847] [INFO] [timer.py:197:stop] 0/5150, RunningAvgSamplesPerSec=6.330798264943373, CurrSamplesPerSec=5.688799432348064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0022, 'learning_rate': 5.402222222222223e-06, 'epoch': 10.91} [2022-12-17 05:04:21,267] [INFO] [timer.py:197:stop] 0/5152, RunningAvgSamplesPerSec=6.3308049922936895, CurrSamplesPerSec=5.739957886590591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:32,804] [INFO] [timer.py:197:stop] 0/5154, RunningAvgSamplesPerSec=6.330808207297696, CurrSamplesPerSec=5.714203582698553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:44,095] [INFO] [timer.py:197:stop] 0/5156, RunningAvgSamplesPerSec=6.330814103852655, CurrSamplesPerSec=5.70229884876786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:55,418] [INFO] [timer.py:197:stop] 0/5158, RunningAvgSamplesPerSec=6.330826468927088, CurrSamplesPerSec=5.736627004540996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:06,939] [INFO] [logging.py:68:log_dist] [Rank 0] step=2580, skipped=5, lr=[5.391111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 05:05:06,941] [INFO] [timer.py:197:stop] 0/5160, RunningAvgSamplesPerSec=6.3308266043403725, CurrSamplesPerSec=5.698408553854269, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:18,251] [INFO] [timer.py:197:stop] 0/5162, RunningAvgSamplesPerSec=6.330825475976958, CurrSamplesPerSec=5.6952141527152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:29,525] [INFO] [timer.py:197:stop] 0/5164, RunningAvgSamplesPerSec=6.330832549112537, CurrSamplesPerSec=5.7298214935295935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:41,133] [INFO] [timer.py:197:stop] 0/5166, RunningAvgSamplesPerSec=6.330825425802116, CurrSamplesPerSec=5.6973268294471975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:52,434] [INFO] [timer.py:197:stop] 0/5168, RunningAvgSamplesPerSec=6.3308264370216225, CurrSamplesPerSec=5.683646328760508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:03,725] [INFO] [timer.py:197:stop] 0/5170, RunningAvgSamplesPerSec=6.330833681391844, CurrSamplesPerSec=5.717644133973545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:15,065] [INFO] [timer.py:197:stop] 0/5172, RunningAvgSamplesPerSec=6.330828764624116, CurrSamplesPerSec=5.7181651783578396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:26,405] [INFO] [timer.py:197:stop] 0/5174, RunningAvgSamplesPerSec=6.330823795604273, CurrSamplesPerSec=5.675452815519628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:37,674] [INFO] [timer.py:197:stop] 0/5176, RunningAvgSamplesPerSec=6.33083273222414, CurrSamplesPerSec=5.721885157322745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:49,031] [INFO] [timer.py:197:stop] 0/5178, RunningAvgSamplesPerSec=6.330822949193593, CurrSamplesPerSec=5.683569311458349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:00,323] [INFO] [logging.py:68:log_dist] [Rank 0] step=2590, skipped=5, lr=[5.368888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:07:00,326] [INFO] [timer.py:197:stop] 0/5180, RunningAvgSamplesPerSec=6.3308245978339865, CurrSamplesPerSec=5.687645430029632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:11,607] [INFO] [timer.py:197:stop] 0/5182, RunningAvgSamplesPerSec=6.330829600063001, CurrSamplesPerSec=5.702640220108133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:22,922] [INFO] [timer.py:197:stop] 0/5184, RunningAvgSamplesPerSec=6.3308299773962595, CurrSamplesPerSec=5.6947854747502085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:34,227] [INFO] [timer.py:197:stop] 0/5186, RunningAvgSamplesPerSec=6.330840511623829, CurrSamplesPerSec=5.7253488887550406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:45,694] [INFO] [timer.py:197:stop] 0/5188, RunningAvgSamplesPerSec=6.330802234493931, CurrSamplesPerSec=5.545931811238111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:57,240] [INFO] [timer.py:197:stop] 0/5190, RunningAvgSamplesPerSec=6.3308106513252556, CurrSamplesPerSec=5.714275107111477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:05,702] [INFO] [timer.py:197:stop] 0/5192, RunningAvgSamplesPerSec=6.331415568524979, CurrSamplesPerSec=10.224405036507427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:16,994] [INFO] [timer.py:197:stop] 0/5194, RunningAvgSamplesPerSec=6.33142151258471, CurrSamplesPerSec=5.708924967667167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:28,320] [INFO] [timer.py:197:stop] 0/5196, RunningAvgSamplesPerSec=6.331419035586852, CurrSamplesPerSec=5.720213493471337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:39,593] [INFO] [timer.py:197:stop] 0/5198, RunningAvgSamplesPerSec=6.331429428749043, CurrSamplesPerSec=5.717296823800851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:50,859] [INFO] [logging.py:68:log_dist] [Rank 0] step=2600, skipped=5, lr=[5.346666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 05:08:50,861] [INFO] [timer.py:197:stop] 0/5200, RunningAvgSamplesPerSec=6.33144061780039, CurrSamplesPerSec=5.7277724037809055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0016, 'learning_rate': 5.346666666666667e-06, 'epoch': 11.02} [2022-12-17 05:09:02,114] [INFO] [timer.py:197:stop] 0/5202, RunningAvgSamplesPerSec=6.331448483641698, CurrSamplesPerSec=5.719249225680997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:13,434] [INFO] [timer.py:197:stop] 0/5204, RunningAvgSamplesPerSec=6.331448183327159, CurrSamplesPerSec=5.699337248682659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:24,734] [INFO] [timer.py:197:stop] 0/5206, RunningAvgSamplesPerSec=6.331452657655695, CurrSamplesPerSec=5.700206208089163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:36,014] [INFO] [timer.py:197:stop] 0/5208, RunningAvgSamplesPerSec=6.331461622033795, CurrSamplesPerSec=5.72109761000533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:47,276] [INFO] [timer.py:197:stop] 0/5210, RunningAvgSamplesPerSec=6.3314746264100314, CurrSamplesPerSec=5.716316496958784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:58,571] [INFO] [timer.py:197:stop] 0/5212, RunningAvgSamplesPerSec=6.331480978826958, CurrSamplesPerSec=5.725256083923386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:09,843] [INFO] [timer.py:197:stop] 0/5214, RunningAvgSamplesPerSec=6.331491432789969, CurrSamplesPerSec=5.725326664166113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:21,125] [INFO] [timer.py:197:stop] 0/5216, RunningAvgSamplesPerSec=6.331499899781304, CurrSamplesPerSec=5.7110269321049865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:32,392] [INFO] [timer.py:197:stop] 0/5218, RunningAvgSamplesPerSec=6.331511769858337, CurrSamplesPerSec=5.712665997638112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:43,678] [INFO] [logging.py:68:log_dist] [Rank 0] step=2610, skipped=5, lr=[5.324444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 05:10:43,680] [INFO] [timer.py:197:stop] 0/5220, RunningAvgSamplesPerSec=6.331518616398237, CurrSamplesPerSec=5.7057262876552635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:54,977] [INFO] [timer.py:197:stop] 0/5222, RunningAvgSamplesPerSec=6.331523063457298, CurrSamplesPerSec=5.700985107994975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:06,265] [INFO] [timer.py:197:stop] 0/5224, RunningAvgSamplesPerSec=6.331530146740582, CurrSamplesPerSec=5.712085180028847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:17,572] [INFO] [timer.py:197:stop] 0/5226, RunningAvgSamplesPerSec=6.331532193459119, CurrSamplesPerSec=5.693007416219925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:28,880] [INFO] [timer.py:197:stop] 0/5228, RunningAvgSamplesPerSec=6.331534394982881, CurrSamplesPerSec=5.702848600275808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:40,160] [INFO] [timer.py:197:stop] 0/5230, RunningAvgSamplesPerSec=6.331543863225301, CurrSamplesPerSec=5.727150386796147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:51,450] [INFO] [timer.py:197:stop] 0/5232, RunningAvgSamplesPerSec=6.3315500633562944, CurrSamplesPerSec=5.715176370807533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:02,731] [INFO] [timer.py:197:stop] 0/5234, RunningAvgSamplesPerSec=6.331554300362814, CurrSamplesPerSec=5.718985546768476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:14,015] [INFO] [timer.py:197:stop] 0/5236, RunningAvgSamplesPerSec=6.3315630531989795, CurrSamplesPerSec=5.7190143016978965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:25,279] [INFO] [timer.py:197:stop] 0/5238, RunningAvgSamplesPerSec=6.331575216395316, CurrSamplesPerSec=5.733168474540448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:36,556] [INFO] [logging.py:68:log_dist] [Rank 0] step=2620, skipped=5, lr=[5.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 05:12:36,557] [INFO] [timer.py:197:stop] 0/5240, RunningAvgSamplesPerSec=6.331583901131349, CurrSamplesPerSec=5.7215373317447575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:47,841] [INFO] [timer.py:197:stop] 0/5242, RunningAvgSamplesPerSec=6.331591212199751, CurrSamplesPerSec=5.712049201910398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:59,137] [INFO] [timer.py:197:stop] 0/5244, RunningAvgSamplesPerSec=6.331595922910599, CurrSamplesPerSec=5.70338658227336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:10,454] [INFO] [timer.py:197:stop] 0/5246, RunningAvgSamplesPerSec=6.3315953067483335, CurrSamplesPerSec=5.677981985152896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:21,789] [INFO] [timer.py:197:stop] 0/5248, RunningAvgSamplesPerSec=6.331597185771025, CurrSamplesPerSec=5.705788625223031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:33,159] [INFO] [timer.py:197:stop] 0/5250, RunningAvgSamplesPerSec=6.331600861122208, CurrSamplesPerSec=5.699541273008275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 5.2911111111111115e-06, 'epoch': 11.12} [2022-12-17 05:13:44,480] [INFO] [timer.py:197:stop] 0/5252, RunningAvgSamplesPerSec=6.331599661255974, CurrSamplesPerSec=5.68312072716045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:55,971] [INFO] [timer.py:197:stop] 0/5254, RunningAvgSamplesPerSec=6.331603344469879, CurrSamplesPerSec=5.695243394093243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:07,295] [INFO] [timer.py:197:stop] 0/5256, RunningAvgSamplesPerSec=6.331597969028604, CurrSamplesPerSec=5.66516708705912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:18,576] [INFO] [timer.py:197:stop] 0/5258, RunningAvgSamplesPerSec=6.331607527232612, CurrSamplesPerSec=5.733064396149069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:29,890] [INFO] [logging.py:68:log_dist] [Rank 0] step=2630, skipped=5, lr=[5.28e-06], mom=[[0.9, 0.999]] [2022-12-17 05:14:29,893] [INFO] [timer.py:197:stop] 0/5260, RunningAvgSamplesPerSec=6.331607577304003, CurrSamplesPerSec=5.703569083004653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:41,225] [INFO] [timer.py:197:stop] 0/5262, RunningAvgSamplesPerSec=6.331622020508325, CurrSamplesPerSec=5.732581277009406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:52,454] [INFO] [timer.py:197:stop] 0/5264, RunningAvgSamplesPerSec=6.331639133639575, CurrSamplesPerSec=5.738322751616373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:03,721] [INFO] [timer.py:197:stop] 0/5266, RunningAvgSamplesPerSec=6.33165171193482, CurrSamplesPerSec=5.7073268783289315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:15,023] [INFO] [timer.py:197:stop] 0/5268, RunningAvgSamplesPerSec=6.331655237231968, CurrSamplesPerSec=5.6875104613911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:26,312] [INFO] [timer.py:197:stop] 0/5270, RunningAvgSamplesPerSec=6.331662476795234, CurrSamplesPerSec=5.703980176569691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:37,605] [INFO] [timer.py:197:stop] 0/5272, RunningAvgSamplesPerSec=6.33166852698621, CurrSamplesPerSec=5.722610214448023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:48,870] [INFO] [timer.py:197:stop] 0/5274, RunningAvgSamplesPerSec=6.33168179634596, CurrSamplesPerSec=5.734479941114413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:00,126] [INFO] [timer.py:197:stop] 0/5276, RunningAvgSamplesPerSec=6.331695376311775, CurrSamplesPerSec=5.709654759127998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:11,408] [INFO] [timer.py:197:stop] 0/5278, RunningAvgSamplesPerSec=6.331700742788226, CurrSamplesPerSec=5.71670313316583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:22,679] [INFO] [logging.py:68:log_dist] [Rank 0] step=2640, skipped=5, lr=[5.257777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 05:16:22,681] [INFO] [timer.py:197:stop] 0/5280, RunningAvgSamplesPerSec=6.331707264117074, CurrSamplesPerSec=5.704465032869232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:34,130] [INFO] [timer.py:197:stop] 0/5282, RunningAvgSamplesPerSec=6.331710296952607, CurrSamplesPerSec=5.699808971206135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:45,425] [INFO] [timer.py:197:stop] 0/5284, RunningAvgSamplesPerSec=6.331715787203118, CurrSamplesPerSec=5.7361687804135455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:56,757] [INFO] [timer.py:197:stop] 0/5286, RunningAvgSamplesPerSec=6.331716077735873, CurrSamplesPerSec=5.7076763763304585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:08,057] [INFO] [timer.py:197:stop] 0/5288, RunningAvgSamplesPerSec=6.3317203145894725, CurrSamplesPerSec=5.714706237753005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:19,368] [INFO] [timer.py:197:stop] 0/5290, RunningAvgSamplesPerSec=6.331722296646484, CurrSamplesPerSec=5.699027246796303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:30,729] [INFO] [timer.py:197:stop] 0/5292, RunningAvgSamplesPerSec=6.331713096218976, CurrSamplesPerSec=5.672534851579894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:42,048] [INFO] [timer.py:197:stop] 0/5294, RunningAvgSamplesPerSec=6.331713210379387, CurrSamplesPerSec=5.700520696294161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:53,394] [INFO] [timer.py:197:stop] 0/5296, RunningAvgSamplesPerSec=6.331707193585852, CurrSamplesPerSec=5.692714762593404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:04,741] [INFO] [timer.py:197:stop] 0/5298, RunningAvgSamplesPerSec=6.3317006432874114, CurrSamplesPerSec=5.682712394124242, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:16,059] [INFO] [logging.py:68:log_dist] [Rank 0] step=2650, skipped=5, lr=[5.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:18:16,061] [INFO] [timer.py:197:stop] 0/5300, RunningAvgSamplesPerSec=6.331700367364679, CurrSamplesPerSec=5.707803322659478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 5.235555555555556e-06, 'epoch': 11.23} [2022-12-17 05:18:27,358] [INFO] [timer.py:197:stop] 0/5302, RunningAvgSamplesPerSec=6.331705986411841, CurrSamplesPerSec=5.715463793550649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:38,619] [INFO] [timer.py:197:stop] 0/5304, RunningAvgSamplesPerSec=6.331712476197681, CurrSamplesPerSec=5.721632455156021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:49,899] [INFO] [timer.py:197:stop] 0/5306, RunningAvgSamplesPerSec=6.331716192002087, CurrSamplesPerSec=5.710224879116211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:01,197] [INFO] [timer.py:197:stop] 0/5308, RunningAvgSamplesPerSec=6.331718498716094, CurrSamplesPerSec=5.696512420788828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:12,489] [INFO] [timer.py:197:stop] 0/5310, RunningAvgSamplesPerSec=6.331716879305524, CurrSamplesPerSec=5.6943266626871045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:23,899] [INFO] [timer.py:197:stop] 0/5312, RunningAvgSamplesPerSec=6.331722781967347, CurrSamplesPerSec=5.709408964927718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:35,201] [INFO] [timer.py:197:stop] 0/5314, RunningAvgSamplesPerSec=6.331726558596379, CurrSamplesPerSec=5.717023340647629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:46,664] [INFO] [timer.py:197:stop] 0/5316, RunningAvgSamplesPerSec=6.33172464843963, CurrSamplesPerSec=5.688770015987062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:57,951] [INFO] [timer.py:197:stop] 0/5318, RunningAvgSamplesPerSec=6.331731507360793, CurrSamplesPerSec=5.714302841549262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:09,198] [INFO] [logging.py:68:log_dist] [Rank 0] step=2660, skipped=5, lr=[5.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 05:20:09,200] [INFO] [timer.py:197:stop] 0/5320, RunningAvgSamplesPerSec=6.331747607613066, CurrSamplesPerSec=5.7378510105983604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:20,482] [INFO] [timer.py:197:stop] 0/5322, RunningAvgSamplesPerSec=6.33175641041769, CurrSamplesPerSec=5.727818113304493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:31,752] [INFO] [timer.py:197:stop] 0/5324, RunningAvgSamplesPerSec=6.331768182998982, CurrSamplesPerSec=5.730554190945366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:43,017] [INFO] [timer.py:197:stop] 0/5326, RunningAvgSamplesPerSec=6.331777148866358, CurrSamplesPerSec=5.71666222711335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:54,287] [INFO] [timer.py:197:stop] 0/5328, RunningAvgSamplesPerSec=6.331788167273932, CurrSamplesPerSec=5.723797489623256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:05,551] [INFO] [timer.py:197:stop] 0/5330, RunningAvgSamplesPerSec=6.331800719530591, CurrSamplesPerSec=5.728999480872663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:16,893] [INFO] [timer.py:197:stop] 0/5332, RunningAvgSamplesPerSec=6.331788250487433, CurrSamplesPerSec=5.615644743162119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:28,160] [INFO] [timer.py:197:stop] 0/5334, RunningAvgSamplesPerSec=6.331799635445216, CurrSamplesPerSec=5.722074454648978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:39,415] [INFO] [timer.py:197:stop] 0/5336, RunningAvgSamplesPerSec=6.331810093831878, CurrSamplesPerSec=5.7304957150562466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:50,651] [INFO] [timer.py:197:stop] 0/5338, RunningAvgSamplesPerSec=6.331828149224771, CurrSamplesPerSec=5.744462205318481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:01,961] [INFO] [logging.py:68:log_dist] [Rank 0] step=2670, skipped=5, lr=[5.1911111111111116e-06], mom=[[0.9, 0.999]] [2022-12-17 05:22:01,963] [INFO] [timer.py:197:stop] 0/5340, RunningAvgSamplesPerSec=6.331828700824646, CurrSamplesPerSec=5.703318723094472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:13,215] [INFO] [timer.py:197:stop] 0/5342, RunningAvgSamplesPerSec=6.331843623981074, CurrSamplesPerSec=5.747786195521225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:24,454] [INFO] [timer.py:197:stop] 0/5344, RunningAvgSamplesPerSec=6.3318581877227516, CurrSamplesPerSec=5.73446230067685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:35,701] [INFO] [timer.py:197:stop] 0/5346, RunningAvgSamplesPerSec=6.331876988632842, CurrSamplesPerSec=5.737213559784555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:46,939] [INFO] [timer.py:197:stop] 0/5348, RunningAvgSamplesPerSec=6.331884598979178, CurrSamplesPerSec=5.723346681922929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:58,261] [INFO] [timer.py:197:stop] 0/5350, RunningAvgSamplesPerSec=6.331906735436208, CurrSamplesPerSec=5.763013587580053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 5.18e-06, 'epoch': 11.33} [2022-12-17 05:23:09,544] [INFO] [timer.py:197:stop] 0/5352, RunningAvgSamplesPerSec=6.331915096942259, CurrSamplesPerSec=5.732819765586832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:20,792] [INFO] [timer.py:197:stop] 0/5354, RunningAvgSamplesPerSec=6.331930152230148, CurrSamplesPerSec=5.7316884677646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:32,086] [INFO] [timer.py:197:stop] 0/5356, RunningAvgSamplesPerSec=6.3319429083498004, CurrSamplesPerSec=5.744001253755843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:43,371] [INFO] [timer.py:197:stop] 0/5358, RunningAvgSamplesPerSec=6.331957376154482, CurrSamplesPerSec=5.740309183024957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:54,647] [INFO] [logging.py:68:log_dist] [Rank 0] step=2680, skipped=5, lr=[5.168888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:23:54,648] [INFO] [timer.py:197:stop] 0/5360, RunningAvgSamplesPerSec=6.331973042164556, CurrSamplesPerSec=5.735553271893245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:05,930] [INFO] [timer.py:197:stop] 0/5362, RunningAvgSamplesPerSec=6.3319876983431085, CurrSamplesPerSec=5.732323467223491, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:17,235] [INFO] [timer.py:197:stop] 0/5364, RunningAvgSamplesPerSec=6.33199726527146, CurrSamplesPerSec=5.740857202934483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:28,512] [INFO] [timer.py:197:stop] 0/5366, RunningAvgSamplesPerSec=6.3320090287277955, CurrSamplesPerSec=5.725859613384882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:39,937] [INFO] [timer.py:197:stop] 0/5368, RunningAvgSamplesPerSec=6.332025183926404, CurrSamplesPerSec=5.734573290236572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:51,212] [INFO] [timer.py:197:stop] 0/5370, RunningAvgSamplesPerSec=6.3320424111755464, CurrSamplesPerSec=5.725142768419131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:02,503] [INFO] [timer.py:197:stop] 0/5372, RunningAvgSamplesPerSec=6.332055348608328, CurrSamplesPerSec=5.726065296764906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:13,783] [INFO] [timer.py:197:stop] 0/5374, RunningAvgSamplesPerSec=6.332067066686548, CurrSamplesPerSec=5.70496088367438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:25,093] [INFO] [timer.py:197:stop] 0/5376, RunningAvgSamplesPerSec=6.332075611655805, CurrSamplesPerSec=5.725530843801487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:36,352] [INFO] [timer.py:197:stop] 0/5378, RunningAvgSamplesPerSec=6.332091996769401, CurrSamplesPerSec=5.747384268558034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:47,653] [INFO] [logging.py:68:log_dist] [Rank 0] step=2690, skipped=5, lr=[5.146666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 05:25:47,655] [INFO] [timer.py:197:stop] 0/5380, RunningAvgSamplesPerSec=6.332102105174284, CurrSamplesPerSec=5.742849321290859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:58,906] [INFO] [timer.py:197:stop] 0/5382, RunningAvgSamplesPerSec=6.332119985619664, CurrSamplesPerSec=5.75074712347447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:10,209] [INFO] [timer.py:197:stop] 0/5384, RunningAvgSamplesPerSec=6.332129202412391, CurrSamplesPerSec=5.7269793252631995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:21,476] [INFO] [timer.py:197:stop] 0/5386, RunningAvgSamplesPerSec=6.332142626128737, CurrSamplesPerSec=5.722102508797743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:32,768] [INFO] [timer.py:197:stop] 0/5388, RunningAvgSamplesPerSec=6.332153973047589, CurrSamplesPerSec=5.720645764517236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:44,052] [INFO] [timer.py:197:stop] 0/5390, RunningAvgSamplesPerSec=6.332168222200867, CurrSamplesPerSec=5.758247911655947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:55,350] [INFO] [timer.py:197:stop] 0/5392, RunningAvgSamplesPerSec=6.332179117202303, CurrSamplesPerSec=5.721149797514245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:06,619] [INFO] [timer.py:197:stop] 0/5394, RunningAvgSamplesPerSec=6.332194122074683, CurrSamplesPerSec=5.742245153181966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:17,923] [INFO] [timer.py:197:stop] 0/5396, RunningAvgSamplesPerSec=6.332204587974897, CurrSamplesPerSec=5.7320757174664925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:29,221] [INFO] [timer.py:197:stop] 0/5398, RunningAvgSamplesPerSec=6.332216127969198, CurrSamplesPerSec=5.722261568824939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:40,523] [INFO] [logging.py:68:log_dist] [Rank 0] step=2700, skipped=5, lr=[5.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 05:27:40,525] [INFO] [timer.py:197:stop] 0/5400, RunningAvgSamplesPerSec=6.332226431537867, CurrSamplesPerSec=5.727787558729274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 5.124444444444445e-06, 'epoch': 11.44} [2022-12-17 05:27:51,834] [INFO] [timer.py:197:stop] 0/5402, RunningAvgSamplesPerSec=6.3322330237590885, CurrSamplesPerSec=5.6972134076352186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:03,079] [INFO] [timer.py:197:stop] 0/5404, RunningAvgSamplesPerSec=6.332250320317892, CurrSamplesPerSec=5.752729839098712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:14,392] [INFO] [timer.py:197:stop] 0/5406, RunningAvgSamplesPerSec=6.332256759997134, CurrSamplesPerSec=5.721441235920432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:25,656] [INFO] [timer.py:197:stop] 0/5408, RunningAvgSamplesPerSec=6.332269340859704, CurrSamplesPerSec=5.725855949325616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:36,913] [INFO] [timer.py:197:stop] 0/5410, RunningAvgSamplesPerSec=6.3322827126004, CurrSamplesPerSec=5.731097413139867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:48,192] [INFO] [timer.py:197:stop] 0/5412, RunningAvgSamplesPerSec=6.332287783761879, CurrSamplesPerSec=5.724056974223865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:59,486] [INFO] [timer.py:197:stop] 0/5414, RunningAvgSamplesPerSec=6.332293646201773, CurrSamplesPerSec=5.699748942659988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:10,933] [INFO] [timer.py:197:stop] 0/5416, RunningAvgSamplesPerSec=6.3323149190795425, CurrSamplesPerSec=5.744511623716563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:22,206] [INFO] [timer.py:197:stop] 0/5418, RunningAvgSamplesPerSec=6.332322147474618, CurrSamplesPerSec=5.681815807331453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:33,427] [INFO] [logging.py:68:log_dist] [Rank 0] step=2710, skipped=5, lr=[5.102222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 05:29:33,429] [INFO] [timer.py:197:stop] 0/5420, RunningAvgSamplesPerSec=6.332336561497905, CurrSamplesPerSec=5.7329387724417895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:44,696] [INFO] [timer.py:197:stop] 0/5422, RunningAvgSamplesPerSec=6.332344831258414, CurrSamplesPerSec=5.70700921207328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:55,939] [INFO] [timer.py:197:stop] 0/5424, RunningAvgSamplesPerSec=6.332363229604519, CurrSamplesPerSec=5.73533685803425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:07,205] [INFO] [timer.py:197:stop] 0/5426, RunningAvgSamplesPerSec=6.332375270831397, CurrSamplesPerSec=5.724526938018898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:18,440] [INFO] [timer.py:197:stop] 0/5428, RunningAvgSamplesPerSec=6.332391038467731, CurrSamplesPerSec=5.740212455607973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:29,715] [INFO] [timer.py:197:stop] 0/5430, RunningAvgSamplesPerSec=6.3324006160843105, CurrSamplesPerSec=5.7166308175061005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:40,990] [INFO] [timer.py:197:stop] 0/5432, RunningAvgSamplesPerSec=6.332410610941344, CurrSamplesPerSec=5.717756665674356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:52,266] [INFO] [timer.py:197:stop] 0/5434, RunningAvgSamplesPerSec=6.332420587194725, CurrSamplesPerSec=5.7336568363246085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:03,530] [INFO] [timer.py:197:stop] 0/5436, RunningAvgSamplesPerSec=6.3324257712121925, CurrSamplesPerSec=5.696499365064274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:14,767] [INFO] [timer.py:197:stop] 0/5438, RunningAvgSamplesPerSec=6.332444231804994, CurrSamplesPerSec=5.7353971483628605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:26,034] [INFO] [logging.py:68:log_dist] [Rank 0] step=2720, skipped=5, lr=[5.0800000000000005e-06], mom=[[0.9, 0.999]] [2022-12-17 05:31:26,036] [INFO] [timer.py:197:stop] 0/5440, RunningAvgSamplesPerSec=6.3324517740998765, CurrSamplesPerSec=5.721149797514245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:37,312] [INFO] [timer.py:197:stop] 0/5442, RunningAvgSamplesPerSec=6.332461417845187, CurrSamplesPerSec=5.694489497472948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:48,559] [INFO] [timer.py:197:stop] 0/5444, RunningAvgSamplesPerSec=6.332477879576189, CurrSamplesPerSec=5.737828443549427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:59,873] [INFO] [timer.py:197:stop] 0/5446, RunningAvgSamplesPerSec=6.332479712229146, CurrSamplesPerSec=5.698187434566532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:11,115] [INFO] [timer.py:197:stop] 0/5448, RunningAvgSamplesPerSec=6.332497359605725, CurrSamplesPerSec=5.72765410016634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:22,609] [INFO] [timer.py:197:stop] 0/5450, RunningAvgSamplesPerSec=6.332510747841616, CurrSamplesPerSec=5.726563199659353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 5.06888888888889e-06, 'epoch': 11.55} [2022-12-17 05:32:33,869] [INFO] [timer.py:197:stop] 0/5452, RunningAvgSamplesPerSec=6.332524405237324, CurrSamplesPerSec=5.726556114074656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:45,116] [INFO] [timer.py:197:stop] 0/5454, RunningAvgSamplesPerSec=6.332536431214967, CurrSamplesPerSec=5.723373772312986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:56,506] [INFO] [timer.py:197:stop] 0/5456, RunningAvgSamplesPerSec=6.332548455978119, CurrSamplesPerSec=5.7462179403283145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:07,765] [INFO] [timer.py:197:stop] 0/5458, RunningAvgSamplesPerSec=6.332557599737585, CurrSamplesPerSec=5.7280026698697135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:19,023] [INFO] [logging.py:68:log_dist] [Rank 0] step=2730, skipped=5, lr=[5.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 05:33:19,025] [INFO] [timer.py:197:stop] 0/5460, RunningAvgSamplesPerSec=6.332571210787725, CurrSamplesPerSec=5.732027736787424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:30,266] [INFO] [timer.py:197:stop] 0/5462, RunningAvgSamplesPerSec=6.332581482932561, CurrSamplesPerSec=5.711506667941586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:41,533] [INFO] [timer.py:197:stop] 0/5464, RunningAvgSamplesPerSec=6.332593738459713, CurrSamplesPerSec=5.729059148948412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:52,804] [INFO] [timer.py:197:stop] 0/5466, RunningAvgSamplesPerSec=6.332604211069591, CurrSamplesPerSec=5.727068031611117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:04,060] [INFO] [timer.py:197:stop] 0/5468, RunningAvgSamplesPerSec=6.3326183044704, CurrSamplesPerSec=5.736048413717722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:15,314] [INFO] [timer.py:197:stop] 0/5470, RunningAvgSamplesPerSec=6.3326320905877935, CurrSamplesPerSec=5.743125526686731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:26,583] [INFO] [timer.py:197:stop] 0/5472, RunningAvgSamplesPerSec=6.332643210929138, CurrSamplesPerSec=5.72707291909308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:37,832] [INFO] [timer.py:197:stop] 0/5474, RunningAvgSamplesPerSec=6.332658774956965, CurrSamplesPerSec=5.7351113932424616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:49,079] [INFO] [timer.py:197:stop] 0/5476, RunningAvgSamplesPerSec=6.332671219062701, CurrSamplesPerSec=5.729178977936781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:00,334] [INFO] [timer.py:197:stop] 0/5478, RunningAvgSamplesPerSec=6.332685789743696, CurrSamplesPerSec=5.738412545745183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:11,572] [INFO] [logging.py:68:log_dist] [Rank 0] step=2740, skipped=5, lr=[5.035555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:35:11,574] [INFO] [timer.py:197:stop] 0/5480, RunningAvgSamplesPerSec=6.332703090955735, CurrSamplesPerSec=5.726971749917317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:22,856] [INFO] [timer.py:197:stop] 0/5482, RunningAvgSamplesPerSec=6.3327110287281245, CurrSamplesPerSec=5.710764253771904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:34,087] [INFO] [timer.py:197:stop] 0/5484, RunningAvgSamplesPerSec=6.332726896523725, CurrSamplesPerSec=5.744172104506499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:45,344] [INFO] [timer.py:197:stop] 0/5486, RunningAvgSamplesPerSec=6.332737779608204, CurrSamplesPerSec=5.7382054837343786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:56,769] [INFO] [timer.py:197:stop] 0/5488, RunningAvgSamplesPerSec=6.332749506085508, CurrSamplesPerSec=5.738857387065375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:08,018] [INFO] [timer.py:197:stop] 0/5490, RunningAvgSamplesPerSec=6.332762552584889, CurrSamplesPerSec=5.727755782316844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:19,293] [INFO] [timer.py:197:stop] 0/5492, RunningAvgSamplesPerSec=6.332772741523956, CurrSamplesPerSec=5.733351171866924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:30,595] [INFO] [timer.py:197:stop] 0/5494, RunningAvgSamplesPerSec=6.332773836264852, CurrSamplesPerSec=5.676408612249814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:41,894] [INFO] [timer.py:197:stop] 0/5496, RunningAvgSamplesPerSec=6.332779233746077, CurrSamplesPerSec=5.698465892839091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:53,225] [INFO] [timer.py:197:stop] 0/5498, RunningAvgSamplesPerSec=6.332780467389881, CurrSamplesPerSec=5.677425009408602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:04,548] [INFO] [logging.py:68:log_dist] [Rank 0] step=2750, skipped=5, lr=[5.013333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 05:37:04,550] [INFO] [timer.py:197:stop] 0/5500, RunningAvgSamplesPerSec=6.33278092368322, CurrSamplesPerSec=5.686835714271792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 5.013333333333333e-06, 'epoch': 11.65} [2022-12-17 05:37:15,905] [INFO] [timer.py:197:stop] 0/5502, RunningAvgSamplesPerSec=6.332783612977782, CurrSamplesPerSec=5.701471877168966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:27,211] [INFO] [timer.py:197:stop] 0/5504, RunningAvgSamplesPerSec=6.332793342968272, CurrSamplesPerSec=5.712175856621704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:38,561] [INFO] [timer.py:197:stop] 0/5506, RunningAvgSamplesPerSec=6.332789638900559, CurrSamplesPerSec=5.665064984738845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:49,847] [INFO] [timer.py:197:stop] 0/5508, RunningAvgSamplesPerSec=6.332797026608971, CurrSamplesPerSec=5.710434299877502, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:01,243] [INFO] [timer.py:197:stop] 0/5510, RunningAvgSamplesPerSec=6.3328017604467, CurrSamplesPerSec=5.685818353795128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:12,604] [INFO] [timer.py:197:stop] 0/5512, RunningAvgSamplesPerSec=6.3328063775165155, CurrSamplesPerSec=5.723522652006832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:23,903] [INFO] [timer.py:197:stop] 0/5514, RunningAvgSamplesPerSec=6.33281705103715, CurrSamplesPerSec=5.715605203494661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:35,238] [INFO] [timer.py:197:stop] 0/5516, RunningAvgSamplesPerSec=6.332823395964448, CurrSamplesPerSec=5.708910398023532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:46,529] [INFO] [timer.py:197:stop] 0/5518, RunningAvgSamplesPerSec=6.33282605944014, CurrSamplesPerSec=5.6991516306995935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:57,861] [INFO] [logging.py:68:log_dist] [Rank 0] step=2760, skipped=5, lr=[4.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 05:38:57,863] [INFO] [timer.py:197:stop] 0/5520, RunningAvgSamplesPerSec=6.33282949171402, CurrSamplesPerSec=5.710830589312635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:09,216] [INFO] [timer.py:197:stop] 0/5522, RunningAvgSamplesPerSec=6.332837247559131, CurrSamplesPerSec=5.6983633125161095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:20,443] [INFO] [timer.py:197:stop] 0/5524, RunningAvgSamplesPerSec=6.332858539457194, CurrSamplesPerSec=5.745685130183153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:31,700] [INFO] [timer.py:197:stop] 0/5526, RunningAvgSamplesPerSec=6.332865922446157, CurrSamplesPerSec=5.710178478218908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:42,975] [INFO] [timer.py:197:stop] 0/5528, RunningAvgSamplesPerSec=6.332877851007105, CurrSamplesPerSec=5.728147390091605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:54,237] [INFO] [timer.py:197:stop] 0/5530, RunningAvgSamplesPerSec=6.332890099141983, CurrSamplesPerSec=5.72886621035573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:05,451] [INFO] [timer.py:197:stop] 0/5532, RunningAvgSamplesPerSec=6.332902675261394, CurrSamplesPerSec=5.730549542190849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:16,803] [INFO] [timer.py:197:stop] 0/5534, RunningAvgSamplesPerSec=6.33290837551137, CurrSamplesPerSec=5.707021102686406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:28,091] [INFO] [timer.py:197:stop] 0/5536, RunningAvgSamplesPerSec=6.332914417331108, CurrSamplesPerSec=5.70401605311191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:39,392] [INFO] [timer.py:197:stop] 0/5538, RunningAvgSamplesPerSec=6.33291884533572, CurrSamplesPerSec=5.687822104017034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:50,946] [INFO] [logging.py:68:log_dist] [Rank 0] step=2770, skipped=5, lr=[4.968888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:40:50,948] [INFO] [timer.py:197:stop] 0/5540, RunningAvgSamplesPerSec=6.332925819119135, CurrSamplesPerSec=5.72190881880358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:02,276] [INFO] [timer.py:197:stop] 0/5542, RunningAvgSamplesPerSec=6.3329356245064075, CurrSamplesPerSec=5.722274254974737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:13,559] [INFO] [timer.py:197:stop] 0/5544, RunningAvgSamplesPerSec=6.332943514845411, CurrSamplesPerSec=5.684454414219593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:24,945] [INFO] [timer.py:197:stop] 0/5546, RunningAvgSamplesPerSec=6.332950889628434, CurrSamplesPerSec=5.713319163448986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:36,237] [INFO] [timer.py:197:stop] 0/5548, RunningAvgSamplesPerSec=6.332957682383578, CurrSamplesPerSec=5.703936058737852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:47,562] [INFO] [timer.py:197:stop] 0/5550, RunningAvgSamplesPerSec=6.332956047074123, CurrSamplesPerSec=5.685354242281351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.957777777777778e-06, 'epoch': 11.76} [2022-12-17 05:41:58,980] [INFO] [timer.py:197:stop] 0/5552, RunningAvgSamplesPerSec=6.332966331820771, CurrSamplesPerSec=5.733141536241398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:10,228] [INFO] [timer.py:197:stop] 0/5554, RunningAvgSamplesPerSec=6.332978090415024, CurrSamplesPerSec=5.718462160113856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:21,520] [INFO] [timer.py:197:stop] 0/5556, RunningAvgSamplesPerSec=6.332983898220038, CurrSamplesPerSec=5.697712836015588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:32,801] [INFO] [timer.py:197:stop] 0/5558, RunningAvgSamplesPerSec=6.332992313775625, CurrSamplesPerSec=5.715544841767603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:44,184] [INFO] [logging.py:68:log_dist] [Rank 0] step=2780, skipped=5, lr=[4.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 05:42:44,186] [INFO] [timer.py:197:stop] 0/5560, RunningAvgSamplesPerSec=6.332978351255422, CurrSamplesPerSec=5.627949114964652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:55,885] [INFO] [timer.py:197:stop] 0/5562, RunningAvgSamplesPerSec=6.332894527720718, CurrSamplesPerSec=5.328057955897367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:07,213] [INFO] [timer.py:197:stop] 0/5564, RunningAvgSamplesPerSec=6.332891784684995, CurrSamplesPerSec=5.661491093777932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:18,518] [INFO] [timer.py:197:stop] 0/5566, RunningAvgSamplesPerSec=6.332894393890626, CurrSamplesPerSec=5.710761823927876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:29,902] [INFO] [timer.py:197:stop] 0/5568, RunningAvgSamplesPerSec=6.332878696942575, CurrSamplesPerSec=5.613556043012562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:41,167] [INFO] [timer.py:197:stop] 0/5570, RunningAvgSamplesPerSec=6.332889244247719, CurrSamplesPerSec=5.7199923851472185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:52,430] [INFO] [timer.py:197:stop] 0/5572, RunningAvgSamplesPerSec=6.332901017021467, CurrSamplesPerSec=5.740034966821891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:03,768] [INFO] [timer.py:197:stop] 0/5574, RunningAvgSamplesPerSec=6.332896168709029, CurrSamplesPerSec=5.663117615475737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:15,088] [INFO] [timer.py:197:stop] 0/5576, RunningAvgSamplesPerSec=6.332895772682674, CurrSamplesPerSec=5.692906722591734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:26,629] [INFO] [timer.py:197:stop] 0/5578, RunningAvgSamplesPerSec=6.332905969270301, CurrSamplesPerSec=5.718397839872948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:38,107] [INFO] [logging.py:68:log_dist] [Rank 0] step=2790, skipped=5, lr=[4.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 05:44:38,108] [INFO] [timer.py:197:stop] 0/5580, RunningAvgSamplesPerSec=6.332897487884689, CurrSamplesPerSec=5.719037939441172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:49,419] [INFO] [timer.py:197:stop] 0/5582, RunningAvgSamplesPerSec=6.332897838621812, CurrSamplesPerSec=5.70153751260587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:00,687] [INFO] [timer.py:197:stop] 0/5584, RunningAvgSamplesPerSec=6.3329080849694765, CurrSamplesPerSec=5.710074018225663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:12,195] [INFO] [timer.py:197:stop] 0/5586, RunningAvgSamplesPerSec=6.332903922957749, CurrSamplesPerSec=5.720801817491398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:23,531] [INFO] [timer.py:197:stop] 0/5588, RunningAvgSamplesPerSec=6.332901741022134, CurrSamplesPerSec=5.674606021061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:34,856] [INFO] [timer.py:197:stop] 0/5590, RunningAvgSamplesPerSec=6.332897804579388, CurrSamplesPerSec=5.681754473526077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:46,217] [INFO] [timer.py:197:stop] 0/5592, RunningAvgSamplesPerSec=6.33288074159132, CurrSamplesPerSec=5.6802431940249365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:57,527] [INFO] [timer.py:197:stop] 0/5594, RunningAvgSamplesPerSec=6.3328827302159905, CurrSamplesPerSec=5.692431554689089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:08,808] [INFO] [timer.py:197:stop] 0/5596, RunningAvgSamplesPerSec=6.332891171460052, CurrSamplesPerSec=5.714777044771578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:20,139] [INFO] [timer.py:197:stop] 0/5598, RunningAvgSamplesPerSec=6.332887723565751, CurrSamplesPerSec=5.698770752387616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:31,417] [INFO] [logging.py:68:log_dist] [Rank 0] step=2800, skipped=5, lr=[4.902222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 05:46:31,419] [INFO] [timer.py:197:stop] 0/5600, RunningAvgSamplesPerSec=6.332889171708729, CurrSamplesPerSec=5.688668025550359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0017, 'learning_rate': 4.902222222222222e-06, 'epoch': 11.86} [2022-12-17 05:46:42,751] [INFO] [timer.py:197:stop] 0/5602, RunningAvgSamplesPerSec=6.332892828513426, CurrSamplesPerSec=5.702749981251817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:54,210] [INFO] [timer.py:197:stop] 0/5604, RunningAvgSamplesPerSec=6.332901936235582, CurrSamplesPerSec=5.735084681740569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:05,700] [INFO] [timer.py:197:stop] 0/5606, RunningAvgSamplesPerSec=6.33290293370556, CurrSamplesPerSec=5.691225158507459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:17,388] [INFO] [timer.py:197:stop] 0/5608, RunningAvgSamplesPerSec=6.332820121132463, CurrSamplesPerSec=5.3429333136735115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:28,766] [INFO] [timer.py:197:stop] 0/5610, RunningAvgSamplesPerSec=6.332821523370211, CurrSamplesPerSec=5.702558326024797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:40,296] [INFO] [timer.py:197:stop] 0/5612, RunningAvgSamplesPerSec=6.332829560330361, CurrSamplesPerSec=5.71075842214971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:51,692] [INFO] [timer.py:197:stop] 0/5614, RunningAvgSamplesPerSec=6.332812496434912, CurrSamplesPerSec=5.608840873003154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:03,251] [INFO] [timer.py:197:stop] 0/5616, RunningAvgSamplesPerSec=6.332808657762249, CurrSamplesPerSec=5.6850399804175815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:14,569] [INFO] [timer.py:197:stop] 0/5618, RunningAvgSamplesPerSec=6.332807418728847, CurrSamplesPerSec=5.673635721564932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:25,871] [INFO] [logging.py:68:log_dist] [Rank 0] step=2810, skipped=5, lr=[4.880000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 05:48:25,872] [INFO] [timer.py:197:stop] 0/5620, RunningAvgSamplesPerSec=6.332805826090633, CurrSamplesPerSec=5.68405455531197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:37,425] [INFO] [timer.py:197:stop] 0/5622, RunningAvgSamplesPerSec=6.332799703924759, CurrSamplesPerSec=5.670917772159438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:48,976] [INFO] [timer.py:197:stop] 0/5624, RunningAvgSamplesPerSec=6.332805524442007, CurrSamplesPerSec=5.699043943930853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:00,253] [INFO] [timer.py:197:stop] 0/5626, RunningAvgSamplesPerSec=6.332806477016322, CurrSamplesPerSec=5.683839843942882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:11,510] [INFO] [timer.py:197:stop] 0/5628, RunningAvgSamplesPerSec=6.3328085207659495, CurrSamplesPerSec=5.6995364324043365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:22,835] [INFO] [timer.py:197:stop] 0/5630, RunningAvgSamplesPerSec=6.332809979530304, CurrSamplesPerSec=5.70279868456956, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:34,163] [INFO] [timer.py:197:stop] 0/5632, RunningAvgSamplesPerSec=6.33280764518486, CurrSamplesPerSec=5.694752613713792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:45,517] [INFO] [timer.py:197:stop] 0/5634, RunningAvgSamplesPerSec=6.332799728337538, CurrSamplesPerSec=5.660992264487115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:56,825] [INFO] [timer.py:197:stop] 0/5636, RunningAvgSamplesPerSec=6.332801426210607, CurrSamplesPerSec=5.695056109472525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:08,283] [INFO] [timer.py:197:stop] 0/5638, RunningAvgSamplesPerSec=6.332765862833, CurrSamplesPerSec=5.530515181442866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:19,615] [INFO] [logging.py:68:log_dist] [Rank 0] step=2820, skipped=5, lr=[4.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 05:50:19,616] [INFO] [timer.py:197:stop] 0/5640, RunningAvgSamplesPerSec=6.33276122207884, CurrSamplesPerSec=5.680091028575591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:30,917] [INFO] [timer.py:197:stop] 0/5642, RunningAvgSamplesPerSec=6.332763728625373, CurrSamplesPerSec=5.711915503414802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:42,205] [INFO] [timer.py:197:stop] 0/5644, RunningAvgSamplesPerSec=6.332769598022131, CurrSamplesPerSec=5.701877339420423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:53,471] [INFO] [timer.py:197:stop] 0/5646, RunningAvgSamplesPerSec=6.332780589594118, CurrSamplesPerSec=5.726134919783692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:04,789] [INFO] [timer.py:197:stop] 0/5648, RunningAvgSamplesPerSec=6.332779730197156, CurrSamplesPerSec=5.677553735791651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:16,212] [INFO] [timer.py:197:stop] 0/5650, RunningAvgSamplesPerSec=6.332758376727921, CurrSamplesPerSec=5.638280024107815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 4.846666666666667e-06, 'epoch': 11.97} [2022-12-17 05:51:27,493] [INFO] [timer.py:197:stop] 0/5652, RunningAvgSamplesPerSec=6.332766430333524, CurrSamplesPerSec=5.727767270671415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:38,883] [INFO] [timer.py:197:stop] 0/5654, RunningAvgSamplesPerSec=6.332771971674966, CurrSamplesPerSec=5.71815494655829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:50,324] [INFO] [timer.py:197:stop] 0/5656, RunningAvgSamplesPerSec=6.332751991785921, CurrSamplesPerSec=5.691306486070565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:01,597] [INFO] [timer.py:197:stop] 0/5658, RunningAvgSamplesPerSec=6.332761303218689, CurrSamplesPerSec=5.726962708401693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:12,901] [INFO] [logging.py:68:log_dist] [Rank 0] step=2830, skipped=5, lr=[4.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:52:12,902] [INFO] [timer.py:197:stop] 0/5660, RunningAvgSamplesPerSec=6.332763278259856, CurrSamplesPerSec=5.6826154326731615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:24,257] [INFO] [timer.py:197:stop] 0/5662, RunningAvgSamplesPerSec=6.332754392749188, CurrSamplesPerSec=5.706142058898655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:32,752] [INFO] [timer.py:197:stop] 0/5664, RunningAvgSamplesPerSec=6.333304594674231, CurrSamplesPerSec=10.171249085693656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:44,051] [INFO] [timer.py:197:stop] 0/5666, RunningAvgSamplesPerSec=6.333307790781571, CurrSamplesPerSec=5.710589310286816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:55,435] [INFO] [timer.py:197:stop] 0/5668, RunningAvgSamplesPerSec=6.333290583434095, CurrSamplesPerSec=5.666261032287638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:06,766] [INFO] [timer.py:197:stop] 0/5670, RunningAvgSamplesPerSec=6.33328694885843, CurrSamplesPerSec=5.692666714254806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:18,101] [INFO] [timer.py:197:stop] 0/5672, RunningAvgSamplesPerSec=6.333282186189515, CurrSamplesPerSec=5.670440039003418, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:29,691] [INFO] [timer.py:197:stop] 0/5674, RunningAvgSamplesPerSec=6.333221424987277, CurrSamplesPerSec=5.713069892305831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:41,004] [INFO] [timer.py:197:stop] 0/5676, RunningAvgSamplesPerSec=6.333218738932709, CurrSamplesPerSec=5.699126221095192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:52,433] [INFO] [timer.py:197:stop] 0/5678, RunningAvgSamplesPerSec=6.3332006328558315, CurrSamplesPerSec=5.607977987206557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:03,976] [INFO] [logging.py:68:log_dist] [Rank 0] step=2840, skipped=5, lr=[4.8133333333333336e-06], mom=[[0.9, 0.999]] [2022-12-17 05:54:03,978] [INFO] [timer.py:197:stop] 0/5680, RunningAvgSamplesPerSec=6.333199732486365, CurrSamplesPerSec=5.711543125327429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:15,293] [INFO] [timer.py:197:stop] 0/5682, RunningAvgSamplesPerSec=6.333195939860297, CurrSamplesPerSec=5.6958679262802505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:26,666] [INFO] [timer.py:197:stop] 0/5684, RunningAvgSamplesPerSec=6.33318316638135, CurrSamplesPerSec=5.636633885696688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:37,975] [INFO] [timer.py:197:stop] 0/5686, RunningAvgSamplesPerSec=6.3331854938121515, CurrSamplesPerSec=5.702362565085231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:49,305] [INFO] [timer.py:197:stop] 0/5688, RunningAvgSamplesPerSec=6.333182590483902, CurrSamplesPerSec=5.704248049485593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:00,779] [INFO] [timer.py:197:stop] 0/5690, RunningAvgSamplesPerSec=6.33314737555965, CurrSamplesPerSec=5.53050811692653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:12,113] [INFO] [timer.py:197:stop] 0/5692, RunningAvgSamplesPerSec=6.333142736354172, CurrSamplesPerSec=5.687230903873154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:23,408] [INFO] [timer.py:197:stop] 0/5694, RunningAvgSamplesPerSec=6.333146815189666, CurrSamplesPerSec=5.720327101529128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:35,018] [INFO] [timer.py:197:stop] 0/5696, RunningAvgSamplesPerSec=6.333081745735081, CurrSamplesPerSec=5.42185939073298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:46,331] [INFO] [timer.py:197:stop] 0/5698, RunningAvgSamplesPerSec=6.3330818798007895, CurrSamplesPerSec=5.698708810029422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:57,840] [INFO] [logging.py:68:log_dist] [Rank 0] step=2850, skipped=5, lr=[4.791111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 05:55:57,841] [INFO] [timer.py:197:stop] 0/5700, RunningAvgSamplesPerSec=6.333078349179951, CurrSamplesPerSec=5.692917347170979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0008, 'learning_rate': 4.791111111111111e-06, 'epoch': 12.08} [2022-12-17 05:56:09,228] [INFO] [timer.py:197:stop] 0/5702, RunningAvgSamplesPerSec=6.333059736306311, CurrSamplesPerSec=5.597479915333243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:20,516] [INFO] [timer.py:197:stop] 0/5704, RunningAvgSamplesPerSec=6.333065159279361, CurrSamplesPerSec=5.715673598766464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:31,823] [INFO] [timer.py:197:stop] 0/5706, RunningAvgSamplesPerSec=6.333066730706159, CurrSamplesPerSec=5.701342790478605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:43,423] [INFO] [timer.py:197:stop] 0/5708, RunningAvgSamplesPerSec=6.333056047616498, CurrSamplesPerSec=5.715045689417052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:54,704] [INFO] [timer.py:197:stop] 0/5710, RunningAvgSamplesPerSec=6.33306372097821, CurrSamplesPerSec=5.709811184638263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:06,048] [INFO] [timer.py:197:stop] 0/5712, RunningAvgSamplesPerSec=6.333071282850042, CurrSamplesPerSec=5.705976131253787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:17,368] [INFO] [timer.py:197:stop] 0/5714, RunningAvgSamplesPerSec=6.33306681450364, CurrSamplesPerSec=5.652331112271843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:28,654] [INFO] [timer.py:197:stop] 0/5716, RunningAvgSamplesPerSec=6.3330734652637615, CurrSamplesPerSec=5.707509874027401, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:40,257] [INFO] [timer.py:197:stop] 0/5718, RunningAvgSamplesPerSec=6.3330681573262, CurrSamplesPerSec=5.672987041388273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:51,815] [INFO] [logging.py:68:log_dist] [Rank 0] step=2860, skipped=5, lr=[4.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:57:51,817] [INFO] [timer.py:197:stop] 0/5720, RunningAvgSamplesPerSec=6.333063883374338, CurrSamplesPerSec=5.687774138233812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:03,114] [INFO] [timer.py:197:stop] 0/5722, RunningAvgSamplesPerSec=6.333068138037818, CurrSamplesPerSec=5.7160036715575195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:14,417] [INFO] [timer.py:197:stop] 0/5724, RunningAvgSamplesPerSec=6.3330711810578535, CurrSamplesPerSec=5.703349017171229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:25,784] [INFO] [timer.py:197:stop] 0/5726, RunningAvgSamplesPerSec=6.3330600285572825, CurrSamplesPerSec=5.726510424690953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:37,095] [INFO] [timer.py:197:stop] 0/5728, RunningAvgSamplesPerSec=6.3330607107411065, CurrSamplesPerSec=5.6806035677858535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:48,461] [INFO] [timer.py:197:stop] 0/5730, RunningAvgSamplesPerSec=6.333046162093266, CurrSamplesPerSec=5.624621524913426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:59,989] [INFO] [timer.py:197:stop] 0/5732, RunningAvgSamplesPerSec=6.333047457967725, CurrSamplesPerSec=5.699362660168836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:11,345] [INFO] [timer.py:197:stop] 0/5734, RunningAvgSamplesPerSec=6.333049646595997, CurrSamplesPerSec=5.7098857569589505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:22,670] [INFO] [timer.py:197:stop] 0/5736, RunningAvgSamplesPerSec=6.3330471630973975, CurrSamplesPerSec=5.656524632617846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:34,092] [INFO] [timer.py:197:stop] 0/5738, RunningAvgSamplesPerSec=6.333058066751148, CurrSamplesPerSec=5.725679347229084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:45,411] [INFO] [logging.py:68:log_dist] [Rank 0] step=2870, skipped=5, lr=[4.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 05:59:45,413] [INFO] [timer.py:197:stop] 0/5740, RunningAvgSamplesPerSec=6.333056508201623, CurrSamplesPerSec=5.693592330700572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:56,699] [INFO] [timer.py:197:stop] 0/5742, RunningAvgSamplesPerSec=6.333056869621617, CurrSamplesPerSec=5.682889724245197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:07,969] [INFO] [timer.py:197:stop] 0/5744, RunningAvgSamplesPerSec=6.3330669801920125, CurrSamplesPerSec=5.732264465585629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:19,237] [INFO] [timer.py:197:stop] 0/5746, RunningAvgSamplesPerSec=6.333077193576914, CurrSamplesPerSec=5.734636504753801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:30,764] [INFO] [timer.py:197:stop] 0/5748, RunningAvgSamplesPerSec=6.333027780024121, CurrSamplesPerSec=5.457497362896864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:42,080] [INFO] [timer.py:197:stop] 0/5750, RunningAvgSamplesPerSec=6.333026353581669, CurrSamplesPerSec=5.670943410065879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.735555555555556e-06, 'epoch': 12.18} [2022-12-17 06:00:53,386] [INFO] [timer.py:197:stop] 0/5752, RunningAvgSamplesPerSec=6.333034645889236, CurrSamplesPerSec=5.7249857459432505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:04,646] [INFO] [timer.py:197:stop] 0/5754, RunningAvgSamplesPerSec=6.333043290240301, CurrSamplesPerSec=5.723611739462003, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:15,937] [INFO] [timer.py:197:stop] 0/5756, RunningAvgSamplesPerSec=6.333048192210894, CurrSamplesPerSec=5.7072176689809595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:27,234] [INFO] [timer.py:197:stop] 0/5758, RunningAvgSamplesPerSec=6.333051890604268, CurrSamplesPerSec=5.698058980366412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:38,543] [INFO] [logging.py:68:log_dist] [Rank 0] step=2880, skipped=5, lr=[4.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 06:01:38,545] [INFO] [timer.py:197:stop] 0/5760, RunningAvgSamplesPerSec=6.333052965722199, CurrSamplesPerSec=5.686170522167649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:49,806] [INFO] [timer.py:197:stop] 0/5762, RunningAvgSamplesPerSec=6.333061178432799, CurrSamplesPerSec=5.7263543044379785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:01,134] [INFO] [timer.py:197:stop] 0/5764, RunningAvgSamplesPerSec=6.333058222287947, CurrSamplesPerSec=5.685659145615068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:12,459] [INFO] [timer.py:197:stop] 0/5766, RunningAvgSamplesPerSec=6.333056332696115, CurrSamplesPerSec=5.68130040403441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:23,762] [INFO] [timer.py:197:stop] 0/5768, RunningAvgSamplesPerSec=6.33305904526347, CurrSamplesPerSec=5.6914582876511375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:35,061] [INFO] [timer.py:197:stop] 0/5770, RunningAvgSamplesPerSec=6.333062133887484, CurrSamplesPerSec=5.714090461007016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:46,746] [INFO] [timer.py:197:stop] 0/5772, RunningAvgSamplesPerSec=6.332981545034309, CurrSamplesPerSec=5.330093869140605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:58,054] [INFO] [timer.py:197:stop] 0/5774, RunningAvgSamplesPerSec=6.332982922697146, CurrSamplesPerSec=5.684175638494878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:09,529] [INFO] [timer.py:197:stop] 0/5776, RunningAvgSamplesPerSec=6.332982962694763, CurrSamplesPerSec=5.70664256871121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:20,858] [INFO] [timer.py:197:stop] 0/5778, RunningAvgSamplesPerSec=6.332978596550738, CurrSamplesPerSec=5.674524450355167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:32,169] [INFO] [logging.py:68:log_dist] [Rank 0] step=2890, skipped=5, lr=[4.7022222222222225e-06], mom=[[0.9, 0.999]] [2022-12-17 06:03:32,170] [INFO] [timer.py:197:stop] 0/5780, RunningAvgSamplesPerSec=6.332980005918211, CurrSamplesPerSec=5.709917092489247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:43,748] [INFO] [timer.py:197:stop] 0/5782, RunningAvgSamplesPerSec=6.33297547290769, CurrSamplesPerSec=5.677242016650395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:55,179] [INFO] [timer.py:197:stop] 0/5784, RunningAvgSamplesPerSec=6.332969718676225, CurrSamplesPerSec=5.670225157206506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:06,469] [INFO] [timer.py:197:stop] 0/5786, RunningAvgSamplesPerSec=6.33297987413485, CurrSamplesPerSec=5.703800073451162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:17,960] [INFO] [timer.py:197:stop] 0/5788, RunningAvgSamplesPerSec=6.332987330191376, CurrSamplesPerSec=5.696377756282663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:29,355] [INFO] [timer.py:197:stop] 0/5790, RunningAvgSamplesPerSec=6.332994329889643, CurrSamplesPerSec=5.71702041843999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:40,697] [INFO] [timer.py:197:stop] 0/5792, RunningAvgSamplesPerSec=6.3329883259810735, CurrSamplesPerSec=5.636968860406683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:51,944] [INFO] [timer.py:197:stop] 0/5794, RunningAvgSamplesPerSec=6.333002093160635, CurrSamplesPerSec=5.73800064443515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:03,355] [INFO] [timer.py:197:stop] 0/5796, RunningAvgSamplesPerSec=6.332997662056815, CurrSamplesPerSec=5.709971991224141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:14,590] [INFO] [timer.py:197:stop] 0/5798, RunningAvgSamplesPerSec=6.3330116405150365, CurrSamplesPerSec=5.737638347737378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:26,187] [INFO] [logging.py:68:log_dist] [Rank 0] step=2900, skipped=5, lr=[4.680000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 06:05:26,189] [INFO] [timer.py:197:stop] 0/5800, RunningAvgSamplesPerSec=6.333014348129561, CurrSamplesPerSec=5.693146992655114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0008, 'learning_rate': 4.680000000000001e-06, 'epoch': 12.29} [2022-12-17 06:05:37,429] [INFO] [timer.py:197:stop] 0/5802, RunningAvgSamplesPerSec=6.333029910069844, CurrSamplesPerSec=5.731119927273069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:48,682] [INFO] [timer.py:197:stop] 0/5804, RunningAvgSamplesPerSec=6.333035842662751, CurrSamplesPerSec=5.704081019890971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:00,097] [INFO] [timer.py:197:stop] 0/5806, RunningAvgSamplesPerSec=6.333042504696337, CurrSamplesPerSec=5.712226909053271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:11,623] [INFO] [timer.py:197:stop] 0/5808, RunningAvgSamplesPerSec=6.333043987856465, CurrSamplesPerSec=5.68961477239248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:22,901] [INFO] [timer.py:197:stop] 0/5810, RunningAvgSamplesPerSec=6.33304796554682, CurrSamplesPerSec=5.692458594636181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:34,200] [INFO] [timer.py:197:stop] 0/5812, RunningAvgSamplesPerSec=6.333053093884182, CurrSamplesPerSec=5.704069383941154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:45,832] [INFO] [timer.py:197:stop] 0/5814, RunningAvgSamplesPerSec=6.333036366855897, CurrSamplesPerSec=5.698952715852549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:57,143] [INFO] [timer.py:197:stop] 0/5816, RunningAvgSamplesPerSec=6.333036955464624, CurrSamplesPerSec=5.704274232079576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:08,666] [INFO] [timer.py:197:stop] 0/5818, RunningAvgSamplesPerSec=6.33303540294264, CurrSamplesPerSec=5.693990392427025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:20,167] [INFO] [logging.py:68:log_dist] [Rank 0] step=2910, skipped=5, lr=[4.6577777777777785e-06], mom=[[0.9, 0.999]] [2022-12-17 06:07:20,169] [INFO] [timer.py:197:stop] 0/5820, RunningAvgSamplesPerSec=6.333035206874964, CurrSamplesPerSec=5.698076155639671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:31,475] [INFO] [timer.py:197:stop] 0/5822, RunningAvgSamplesPerSec=6.333036966743757, CurrSamplesPerSec=5.705704457700638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:42,769] [INFO] [timer.py:197:stop] 0/5824, RunningAvgSamplesPerSec=6.333041189439793, CurrSamplesPerSec=5.708153121779355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:54,295] [INFO] [timer.py:197:stop] 0/5826, RunningAvgSamplesPerSec=6.333039135741505, CurrSamplesPerSec=5.678836995791065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:05,618] [INFO] [timer.py:197:stop] 0/5828, RunningAvgSamplesPerSec=6.333037473781379, CurrSamplesPerSec=5.682891167955098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:16,973] [INFO] [timer.py:197:stop] 0/5830, RunningAvgSamplesPerSec=6.333024679570001, CurrSamplesPerSec=5.645582961104968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:28,288] [INFO] [timer.py:197:stop] 0/5832, RunningAvgSamplesPerSec=6.333023503984594, CurrSamplesPerSec=5.709940412176869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:39,578] [INFO] [timer.py:197:stop] 0/5834, RunningAvgSamplesPerSec=6.333028469659971, CurrSamplesPerSec=5.7119590155108995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:50,859] [INFO] [timer.py:197:stop] 0/5836, RunningAvgSamplesPerSec=6.333035625576971, CurrSamplesPerSec=5.6891756027981115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:02,335] [INFO] [timer.py:197:stop] 0/5838, RunningAvgSamplesPerSec=6.333039610848032, CurrSamplesPerSec=5.70028851881641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:13,596] [INFO] [logging.py:68:log_dist] [Rank 0] step=2920, skipped=5, lr=[4.635555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 06:09:13,598] [INFO] [timer.py:197:stop] 0/5840, RunningAvgSamplesPerSec=6.333048189129644, CurrSamplesPerSec=5.726195505402264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:24,852] [INFO] [timer.py:197:stop] 0/5842, RunningAvgSamplesPerSec=6.333057708895621, CurrSamplesPerSec=5.724060147746541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:36,268] [INFO] [timer.py:197:stop] 0/5844, RunningAvgSamplesPerSec=6.333058364260927, CurrSamplesPerSec=5.700625049162459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:47,565] [INFO] [timer.py:197:stop] 0/5846, RunningAvgSamplesPerSec=6.333062597156241, CurrSamplesPerSec=5.706827461725405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:58,850] [INFO] [timer.py:197:stop] 0/5848, RunningAvgSamplesPerSec=6.3330712101826085, CurrSamplesPerSec=5.722144956467561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:10,241] [INFO] [timer.py:197:stop] 0/5850, RunningAvgSamplesPerSec=6.333076512104131, CurrSamplesPerSec=5.716131962346995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 4.624444444444445e-06, 'epoch': 12.39} [2022-12-17 06:10:21,549] [INFO] [timer.py:197:stop] 0/5852, RunningAvgSamplesPerSec=6.333077908595145, CurrSamplesPerSec=5.686055135248809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:32,841] [INFO] [timer.py:197:stop] 0/5854, RunningAvgSamplesPerSec=6.333082986621699, CurrSamplesPerSec=5.695888472429112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:44,226] [INFO] [timer.py:197:stop] 0/5856, RunningAvgSamplesPerSec=6.333068188018257, CurrSamplesPerSec=5.685718877744978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:55,591] [INFO] [timer.py:197:stop] 0/5858, RunningAvgSamplesPerSec=6.333058188138381, CurrSamplesPerSec=5.6309342973964815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:06,930] [INFO] [logging.py:68:log_dist] [Rank 0] step=2930, skipped=5, lr=[4.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 06:11:06,931] [INFO] [timer.py:197:stop] 0/5860, RunningAvgSamplesPerSec=6.333059374007877, CurrSamplesPerSec=5.7035758694456495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:18,215] [INFO] [timer.py:197:stop] 0/5862, RunningAvgSamplesPerSec=6.333066234612839, CurrSamplesPerSec=5.7189429022784335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:29,480] [INFO] [timer.py:197:stop] 0/5864, RunningAvgSamplesPerSec=6.333073388154316, CurrSamplesPerSec=5.700475421412082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:40,883] [INFO] [timer.py:197:stop] 0/5866, RunningAvgSamplesPerSec=6.333075174763755, CurrSamplesPerSec=5.705023932007137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:52,463] [INFO] [timer.py:197:stop] 0/5868, RunningAvgSamplesPerSec=6.33308124388736, CurrSamplesPerSec=5.7109671530431925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:03,942] [INFO] [timer.py:197:stop] 0/5870, RunningAvgSamplesPerSec=6.3330868665001185, CurrSamplesPerSec=5.7183637312413875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:15,485] [INFO] [timer.py:197:stop] 0/5872, RunningAvgSamplesPerSec=6.333047419957166, CurrSamplesPerSec=5.499836215038959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:26,779] [INFO] [timer.py:197:stop] 0/5874, RunningAvgSamplesPerSec=6.333052248890246, CurrSamplesPerSec=5.716640313397472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:38,045] [INFO] [timer.py:197:stop] 0/5876, RunningAvgSamplesPerSec=6.333063104667585, CurrSamplesPerSec=5.7382314883322785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:49,591] [INFO] [timer.py:197:stop] 0/5878, RunningAvgSamplesPerSec=6.333007492183222, CurrSamplesPerSec=5.444904238732139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:00,893] [INFO] [logging.py:68:log_dist] [Rank 0] step=2940, skipped=5, lr=[4.591111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 06:13:00,895] [INFO] [timer.py:197:stop] 0/5880, RunningAvgSamplesPerSec=6.333009913626235, CurrSamplesPerSec=5.695302844432262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:12,177] [INFO] [timer.py:197:stop] 0/5882, RunningAvgSamplesPerSec=6.333018136040713, CurrSamplesPerSec=5.718970438362116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:23,901] [INFO] [timer.py:197:stop] 0/5884, RunningAvgSamplesPerSec=6.333017939189659, CurrSamplesPerSec=5.704586259823864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:35,218] [INFO] [timer.py:197:stop] 0/5886, RunningAvgSamplesPerSec=6.333018089438299, CurrSamplesPerSec=5.71238542051308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:46,725] [INFO] [timer.py:197:stop] 0/5888, RunningAvgSamplesPerSec=6.333024751875963, CurrSamplesPerSec=5.699916444826087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:58,297] [INFO] [timer.py:197:stop] 0/5890, RunningAvgSamplesPerSec=6.333027057896281, CurrSamplesPerSec=5.704002478150968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:09,601] [INFO] [timer.py:197:stop] 0/5892, RunningAvgSamplesPerSec=6.333029796735478, CurrSamplesPerSec=5.7104729302110275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:20,923] [INFO] [timer.py:197:stop] 0/5894, RunningAvgSamplesPerSec=6.333037543211003, CurrSamplesPerSec=5.718080401695344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:32,249] [INFO] [timer.py:197:stop] 0/5896, RunningAvgSamplesPerSec=6.333035956380963, CurrSamplesPerSec=5.708275476630145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:43,527] [INFO] [timer.py:197:stop] 0/5898, RunningAvgSamplesPerSec=6.3330439846194055, CurrSamplesPerSec=5.728124165914754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:54,832] [INFO] [logging.py:68:log_dist] [Rank 0] step=2950, skipped=5, lr=[4.568888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 06:14:54,834] [INFO] [timer.py:197:stop] 0/5900, RunningAvgSamplesPerSec=6.333046513333144, CurrSamplesPerSec=5.708106754545724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.568888888888889e-06, 'epoch': 12.5} [2022-12-17 06:15:06,151] [INFO] [timer.py:197:stop] 0/5902, RunningAvgSamplesPerSec=6.333046281630883, CurrSamplesPerSec=5.713112206069512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:17,464] [INFO] [timer.py:197:stop] 0/5904, RunningAvgSamplesPerSec=6.333052801023202, CurrSamplesPerSec=5.71561007143143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:28,833] [INFO] [timer.py:197:stop] 0/5906, RunningAvgSamplesPerSec=6.333035290080761, CurrSamplesPerSec=5.608293863527618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:40,572] [INFO] [timer.py:197:stop] 0/5908, RunningAvgSamplesPerSec=6.333040218050896, CurrSamplesPerSec=5.708095102165659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:51,848] [INFO] [timer.py:197:stop] 0/5910, RunningAvgSamplesPerSec=6.333052727101973, CurrSamplesPerSec=5.736807960714468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:03,374] [INFO] [timer.py:197:stop] 0/5912, RunningAvgSamplesPerSec=6.333008477793644, CurrSamplesPerSec=5.4947222221881065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:14,794] [INFO] [timer.py:197:stop] 0/5914, RunningAvgSamplesPerSec=6.333011937522531, CurrSamplesPerSec=5.707201651961366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:26,254] [INFO] [timer.py:197:stop] 0/5916, RunningAvgSamplesPerSec=6.333016941064426, CurrSamplesPerSec=5.697815392920308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:37,755] [INFO] [timer.py:197:stop] 0/5918, RunningAvgSamplesPerSec=6.332988295785954, CurrSamplesPerSec=5.547189727597872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:49,165] [INFO] [logging.py:68:log_dist] [Rank 0] step=2960, skipped=5, lr=[4.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 06:16:49,166] [INFO] [timer.py:197:stop] 0/5920, RunningAvgSamplesPerSec=6.332992632489854, CurrSamplesPerSec=5.715456978790322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:00,543] [INFO] [timer.py:197:stop] 0/5922, RunningAvgSamplesPerSec=6.332990709493712, CurrSamplesPerSec=5.685131726842036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:11,794] [INFO] [timer.py:197:stop] 0/5924, RunningAvgSamplesPerSec=6.332998285720652, CurrSamplesPerSec=5.722485780413905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:23,464] [INFO] [timer.py:197:stop] 0/5926, RunningAvgSamplesPerSec=6.332999240441912, CurrSamplesPerSec=5.710772029286689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:34,840] [INFO] [timer.py:197:stop] 0/5928, RunningAvgSamplesPerSec=6.33299967952669, CurrSamplesPerSec=5.670417519968683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:46,128] [INFO] [timer.py:197:stop] 0/5930, RunningAvgSamplesPerSec=6.333002653796711, CurrSamplesPerSec=5.716089116964696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:57,453] [INFO] [timer.py:197:stop] 0/5932, RunningAvgSamplesPerSec=6.33300097953948, CurrSamplesPerSec=5.719599697949569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:08,773] [INFO] [timer.py:197:stop] 0/5934, RunningAvgSamplesPerSec=6.333000297221696, CurrSamplesPerSec=5.683653789922553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:20,073] [INFO] [timer.py:197:stop] 0/5936, RunningAvgSamplesPerSec=6.3330006554919525, CurrSamplesPerSec=5.685189521385519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:31,402] [INFO] [timer.py:197:stop] 0/5938, RunningAvgSamplesPerSec=6.332996660401324, CurrSamplesPerSec=5.696287580558719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:42,786] [INFO] [logging.py:68:log_dist] [Rank 0] step=2970, skipped=5, lr=[4.524444444444444e-06], mom=[[0.9, 0.999]] [2022-12-17 06:18:42,787] [INFO] [timer.py:197:stop] 0/5940, RunningAvgSamplesPerSec=6.332994887762245, CurrSamplesPerSec=5.695508030136036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:54,075] [INFO] [timer.py:197:stop] 0/5942, RunningAvgSamplesPerSec=6.333001122764453, CurrSamplesPerSec=5.720114516738884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:05,677] [INFO] [timer.py:197:stop] 0/5944, RunningAvgSamplesPerSec=6.333004765281942, CurrSamplesPerSec=5.715796276237501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:17,231] [INFO] [timer.py:197:stop] 0/5946, RunningAvgSamplesPerSec=6.333007590785375, CurrSamplesPerSec=5.718903913586931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:28,821] [INFO] [timer.py:197:stop] 0/5948, RunningAvgSamplesPerSec=6.332955003707633, CurrSamplesPerSec=5.437463012309892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:40,183] [INFO] [timer.py:197:stop] 0/5950, RunningAvgSamplesPerSec=6.332957337549395, CurrSamplesPerSec=5.6944965039264135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.513333333333333e-06, 'epoch': 12.61} [2022-12-17 06:19:51,474] [INFO] [timer.py:197:stop] 0/5952, RunningAvgSamplesPerSec=6.332965662367068, CurrSamplesPerSec=5.721009088510862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:02,822] [INFO] [timer.py:197:stop] 0/5954, RunningAvgSamplesPerSec=6.332961677367649, CurrSamplesPerSec=5.668551218034829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:14,160] [INFO] [timer.py:197:stop] 0/5956, RunningAvgSamplesPerSec=6.332959969392417, CurrSamplesPerSec=5.677735067371684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:25,505] [INFO] [timer.py:197:stop] 0/5958, RunningAvgSamplesPerSec=6.332957888895088, CurrSamplesPerSec=5.671679580428901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:36,919] [INFO] [logging.py:68:log_dist] [Rank 0] step=2980, skipped=5, lr=[4.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 06:20:36,920] [INFO] [timer.py:197:stop] 0/5960, RunningAvgSamplesPerSec=6.3329590493220325, CurrSamplesPerSec=5.697653335352861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:48,227] [INFO] [timer.py:197:stop] 0/5962, RunningAvgSamplesPerSec=6.332961638674421, CurrSamplesPerSec=5.70620222218292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:59,636] [INFO] [timer.py:197:stop] 0/5964, RunningAvgSamplesPerSec=6.332966611609955, CurrSamplesPerSec=5.718364949399794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:11,169] [INFO] [timer.py:197:stop] 0/5966, RunningAvgSamplesPerSec=6.332969463847246, CurrSamplesPerSec=5.718971413095601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:22,408] [INFO] [timer.py:197:stop] 0/5968, RunningAvgSamplesPerSec=6.332982594177314, CurrSamplesPerSec=5.7341594900877935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:33,667] [INFO] [timer.py:197:stop] 0/5970, RunningAvgSamplesPerSec=6.3329947216668545, CurrSamplesPerSec=5.718728958379074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:45,273] [INFO] [timer.py:197:stop] 0/5972, RunningAvgSamplesPerSec=6.332985761956139, CurrSamplesPerSec=5.712217914034815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:56,574] [INFO] [timer.py:197:stop] 0/5974, RunningAvgSamplesPerSec=6.332989670456562, CurrSamplesPerSec=5.711422331639339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:08,107] [INFO] [timer.py:197:stop] 0/5976, RunningAvgSamplesPerSec=6.332995013267368, CurrSamplesPerSec=5.723771859777583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:19,699] [INFO] [timer.py:197:stop] 0/5978, RunningAvgSamplesPerSec=6.332998115014715, CurrSamplesPerSec=5.734231759936096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:31,011] [INFO] [logging.py:68:log_dist] [Rank 0] step=2990, skipped=5, lr=[4.48e-06], mom=[[0.9, 0.999]] [2022-12-17 06:22:31,012] [INFO] [timer.py:197:stop] 0/5980, RunningAvgSamplesPerSec=6.332999323998145, CurrSamplesPerSec=5.665226150268889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:42,389] [INFO] [timer.py:197:stop] 0/5982, RunningAvgSamplesPerSec=6.333005629850955, CurrSamplesPerSec=5.700576867200941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:53,952] [INFO] [timer.py:197:stop] 0/5984, RunningAvgSamplesPerSec=6.33301324936841, CurrSamplesPerSec=5.732907918338295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:05,236] [INFO] [timer.py:197:stop] 0/5986, RunningAvgSamplesPerSec=6.333020150963738, CurrSamplesPerSec=5.709220747511372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:16,563] [INFO] [timer.py:197:stop] 0/5988, RunningAvgSamplesPerSec=6.333027199070639, CurrSamplesPerSec=5.72047850468883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:27,940] [INFO] [timer.py:197:stop] 0/5990, RunningAvgSamplesPerSec=6.333021411493446, CurrSamplesPerSec=5.706681147760475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:39,270] [INFO] [timer.py:197:stop] 0/5992, RunningAvgSamplesPerSec=6.333012549087907, CurrSamplesPerSec=5.6532462764159055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:50,584] [INFO] [timer.py:197:stop] 0/5994, RunningAvgSamplesPerSec=6.333012546323928, CurrSamplesPerSec=5.690703945040657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:24:02,141] [INFO] [timer.py:197:stop] 0/5996, RunningAvgSamplesPerSec=6.3329965499097245, CurrSamplesPerSec=5.633968527653969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:24:13,457] [INFO] [timer.py:197:stop] 0/5998, RunningAvgSamplesPerSec=6.332990861516932, CurrSamplesPerSec=5.682410453102639, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:24:24,871] [INFO] [logging.py:68:log_dist] [Rank 0] step=3000, skipped=5, lr=[4.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 06:24:24,873] [INFO] [timer.py:197:stop] 0/6000, RunningAvgSamplesPerSec=6.332989796557147, CurrSamplesPerSec=5.6977181572736075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 4.457777777777778e-06, 'epoch': 12.71} {'eval_loss': 0.191650390625, 'eval_wer': 9.403141746929155, 'eval_runtime': 2117.549, 'eval_samples_per_second': 3.643, 'eval_steps_per_second': 0.456, 'epoch': 12.71} [2022-12-17 06:59:46,120] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step3000 is begin to save! [2022-12-17 06:59:46,130] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-3000/global_step3000/mp_rank_00_model_states.pt [2022-12-17 06:59:46,130] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-3000/global_step3000/mp_rank_00_model_states.pt... [2022-12-17 06:59:49,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-3000/global_step3000/mp_rank_00_model_states.pt. [2022-12-17 06:59:49,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-17 07:00:04,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-17 07:00:04,837] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-17 07:00:04,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! [2022-12-17 07:02:11,279] [INFO] [timer.py:197:stop] 0/6002, RunningAvgSamplesPerSec=6.3329282976802075, CurrSamplesPerSec=5.409848420510742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:02:22,520] [INFO] [timer.py:197:stop] 0/6004, RunningAvgSamplesPerSec=6.3329409702913235, CurrSamplesPerSec=5.727924201320441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:02:33,779] [INFO] [timer.py:197:stop] 0/6006, RunningAvgSamplesPerSec=6.332945347375377, CurrSamplesPerSec=5.715656804046321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:02:45,055] [INFO] [timer.py:197:stop] 0/6008, RunningAvgSamplesPerSec=6.332949877724141, CurrSamplesPerSec=5.705035329354477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:02:56,286] [INFO] [timer.py:197:stop] 0/6010, RunningAvgSamplesPerSec=6.332960763399706, CurrSamplesPerSec=5.712214510521705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:03:07,626] [INFO] [timer.py:197:stop] 0/6012, RunningAvgSamplesPerSec=6.332954862322034, CurrSamplesPerSec=5.664988709294761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:03:19,134] [INFO] [timer.py:197:stop] 0/6014, RunningAvgSamplesPerSec=6.332956478155806, CurrSamplesPerSec=5.691613234976137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:03:30,422] [INFO] [timer.py:197:stop] 0/6016, RunningAvgSamplesPerSec=6.332958805969218, CurrSamplesPerSec=5.706581425626131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:03:41,720] [INFO] [timer.py:197:stop] 0/6018, RunningAvgSamplesPerSec=6.332955750474672, CurrSamplesPerSec=5.677985828400935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:03:53,059] [INFO] [logging.py:68:log_dist] [Rank 0] step=3010, skipped=5, lr=[4.4355555555555555e-06], mom=[[0.9, 0.999]] [2022-12-17 07:03:53,061] [INFO] [timer.py:197:stop] 0/6020, RunningAvgSamplesPerSec=6.332947512441389, CurrSamplesPerSec=5.67887616091657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:04:04,348] [INFO] [timer.py:197:stop] 0/6022, RunningAvgSamplesPerSec=6.332950262424399, CurrSamplesPerSec=5.712307379262475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:04:15,663] [INFO] [timer.py:197:stop] 0/6024, RunningAvgSamplesPerSec=6.33294522532113, CurrSamplesPerSec=5.6699262183186665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:04:26,966] [INFO] [timer.py:197:stop] 0/6026, RunningAvgSamplesPerSec=6.332944727752335, CurrSamplesPerSec=5.705308394311921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:04:38,261] [INFO] [timer.py:197:stop] 0/6028, RunningAvgSamplesPerSec=6.332945641278926, CurrSamplesPerSec=5.7136246414556835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:04:49,577] [INFO] [timer.py:197:stop] 0/6030, RunningAvgSamplesPerSec=6.332943066924493, CurrSamplesPerSec=5.7022036403463865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:05:00,866] [INFO] [timer.py:197:stop] 0/6032, RunningAvgSamplesPerSec=6.332945423624379, CurrSamplesPerSec=5.714399670907678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:05:12,187] [INFO] [timer.py:197:stop] 0/6034, RunningAvgSamplesPerSec=6.332938365767992, CurrSamplesPerSec=5.655726374511836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:05:23,507] [INFO] [timer.py:197:stop] 0/6036, RunningAvgSamplesPerSec=6.332939157128889, CurrSamplesPerSec=5.696587613256186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:05:34,745] [INFO] [timer.py:197:stop] 0/6038, RunningAvgSamplesPerSec=6.332946002518051, CurrSamplesPerSec=5.708284216463061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:05:46,032] [INFO] [logging.py:68:log_dist] [Rank 0] step=3020, skipped=5, lr=[4.413333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 07:05:46,034] [INFO] [timer.py:197:stop] 0/6040, RunningAvgSamplesPerSec=6.332948857092477, CurrSamplesPerSec=5.705083344211374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:05:57,250] [INFO] [timer.py:197:stop] 0/6042, RunningAvgSamplesPerSec=6.332963769215607, CurrSamplesPerSec=5.7226780454749555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:06:08,579] [INFO] [timer.py:197:stop] 0/6044, RunningAvgSamplesPerSec=6.33295502373841, CurrSamplesPerSec=5.639287550383446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:06:19,844] [INFO] [timer.py:197:stop] 0/6046, RunningAvgSamplesPerSec=6.332962754710209, CurrSamplesPerSec=5.697369877713746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:06:31,099] [INFO] [timer.py:197:stop] 0/6048, RunningAvgSamplesPerSec=6.3329694666057845, CurrSamplesPerSec=5.7139761275044565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:06:42,329] [INFO] [timer.py:197:stop] 0/6050, RunningAvgSamplesPerSec=6.332981066028764, CurrSamplesPerSec=5.725672752341051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 4.402222222222223e-06, 'epoch': 12.82} [2022-12-17 07:06:53,602] [INFO] [timer.py:197:stop] 0/6052, RunningAvgSamplesPerSec=6.33299022450167, CurrSamplesPerSec=5.722167644273479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:07:04,833] [INFO] [timer.py:197:stop] 0/6054, RunningAvgSamplesPerSec=6.333001195047098, CurrSamplesPerSec=5.7341029003935455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:07:16,058] [INFO] [timer.py:197:stop] 0/6056, RunningAvgSamplesPerSec=6.333013578299217, CurrSamplesPerSec=5.719579224118402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:07:27,356] [INFO] [timer.py:197:stop] 0/6058, RunningAvgSamplesPerSec=6.33301585082342, CurrSamplesPerSec=5.710740198407118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:07:38,636] [INFO] [logging.py:68:log_dist] [Rank 0] step=3030, skipped=5, lr=[4.391111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 07:07:38,637] [INFO] [timer.py:197:stop] 0/6060, RunningAvgSamplesPerSec=6.333019960453741, CurrSamplesPerSec=5.7041610183311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:07:49,915] [INFO] [timer.py:197:stop] 0/6062, RunningAvgSamplesPerSec=6.333021954913878, CurrSamplesPerSec=5.697483305755978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:08:01,206] [INFO] [timer.py:197:stop] 0/6064, RunningAvgSamplesPerSec=6.33302445189601, CurrSamplesPerSec=5.690578240712397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:08:12,512] [INFO] [timer.py:197:stop] 0/6066, RunningAvgSamplesPerSec=6.333023758845632, CurrSamplesPerSec=5.696591481732633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:08:23,807] [INFO] [timer.py:197:stop] 0/6068, RunningAvgSamplesPerSec=6.33302566180591, CurrSamplesPerSec=5.695524948344975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:08:35,246] [INFO] [timer.py:197:stop] 0/6070, RunningAvgSamplesPerSec=6.333017665904463, CurrSamplesPerSec=5.65430465557005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:08:46,519] [INFO] [timer.py:197:stop] 0/6072, RunningAvgSamplesPerSec=6.333020403951812, CurrSamplesPerSec=5.721240762130593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:08:57,822] [INFO] [timer.py:197:stop] 0/6074, RunningAvgSamplesPerSec=6.333017693939095, CurrSamplesPerSec=5.673858777116046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:09:09,109] [INFO] [timer.py:197:stop] 0/6076, RunningAvgSamplesPerSec=6.333026959977511, CurrSamplesPerSec=5.734302071672384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:09:20,412] [INFO] [timer.py:197:stop] 0/6078, RunningAvgSamplesPerSec=6.3330330568762045, CurrSamplesPerSec=5.706936413481388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:09:31,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=3040, skipped=5, lr=[4.368888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 07:09:31,828] [INFO] [timer.py:197:stop] 0/6080, RunningAvgSamplesPerSec=6.333031876153911, CurrSamplesPerSec=5.6814634566546145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:09:43,109] [INFO] [timer.py:197:stop] 0/6082, RunningAvgSamplesPerSec=6.333036190320956, CurrSamplesPerSec=5.707240238570252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:09:54,383] [INFO] [timer.py:197:stop] 0/6084, RunningAvgSamplesPerSec=6.3330414775741195, CurrSamplesPerSec=5.690312615092251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:10:05,694] [INFO] [timer.py:197:stop] 0/6086, RunningAvgSamplesPerSec=6.333040218328877, CurrSamplesPerSec=5.67669863130965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:10:17,049] [INFO] [timer.py:197:stop] 0/6088, RunningAvgSamplesPerSec=6.333038286994798, CurrSamplesPerSec=5.681935592463785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:10:28,450] [INFO] [timer.py:197:stop] 0/6090, RunningAvgSamplesPerSec=6.33303059688638, CurrSamplesPerSec=5.686027674333798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:10:39,797] [INFO] [timer.py:197:stop] 0/6092, RunningAvgSamplesPerSec=6.333025536818845, CurrSamplesPerSec=5.65996479562066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:10:51,237] [INFO] [timer.py:197:stop] 0/6094, RunningAvgSamplesPerSec=6.333013486821785, CurrSamplesPerSec=5.6392449014850685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:11:02,636] [INFO] [timer.py:197:stop] 0/6096, RunningAvgSamplesPerSec=6.333009741623534, CurrSamplesPerSec=5.685992746360228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:11:14,051] [INFO] [timer.py:197:stop] 0/6098, RunningAvgSamplesPerSec=6.333002641181561, CurrSamplesPerSec=5.677886386032248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:11:25,430] [INFO] [logging.py:68:log_dist] [Rank 0] step=3050, skipped=5, lr=[4.346666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 07:11:25,431] [INFO] [timer.py:197:stop] 0/6100, RunningAvgSamplesPerSec=6.333004553516022, CurrSamplesPerSec=5.699673908755347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 4.346666666666667e-06, 'epoch': 12.92} [2022-12-17 07:11:36,708] [INFO] [timer.py:197:stop] 0/6102, RunningAvgSamplesPerSec=6.3330077852363305, CurrSamplesPerSec=5.688169699341959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:11:48,001] [INFO] [timer.py:197:stop] 0/6104, RunningAvgSamplesPerSec=6.333009660501565, CurrSamplesPerSec=5.695481202896487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:11:59,318] [INFO] [timer.py:197:stop] 0/6106, RunningAvgSamplesPerSec=6.333012871056986, CurrSamplesPerSec=5.700673231938462, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:12:10,622] [INFO] [timer.py:197:stop] 0/6108, RunningAvgSamplesPerSec=6.333013058024148, CurrSamplesPerSec=5.679696830576498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:12:21,970] [INFO] [timer.py:197:stop] 0/6110, RunningAvgSamplesPerSec=6.333018333144457, CurrSamplesPerSec=5.708892186073563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:12:33,264] [INFO] [timer.py:197:stop] 0/6112, RunningAvgSamplesPerSec=6.333019560946497, CurrSamplesPerSec=5.696909198295653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:12:44,676] [INFO] [timer.py:197:stop] 0/6114, RunningAvgSamplesPerSec=6.333009485645984, CurrSamplesPerSec=5.6825918544581535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:12:56,137] [INFO] [timer.py:197:stop] 0/6116, RunningAvgSamplesPerSec=6.333001857085381, CurrSamplesPerSec=5.661122634552742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:13:07,404] [INFO] [timer.py:197:stop] 0/6118, RunningAvgSamplesPerSec=6.333009218347228, CurrSamplesPerSec=5.716704107126596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:13:18,808] [INFO] [logging.py:68:log_dist] [Rank 0] step=3060, skipped=5, lr=[4.324444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 07:13:18,810] [INFO] [timer.py:197:stop] 0/6120, RunningAvgSamplesPerSec=6.33299932992847, CurrSamplesPerSec=5.631900206846387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:13:30,131] [INFO] [timer.py:197:stop] 0/6122, RunningAvgSamplesPerSec=6.333000625987813, CurrSamplesPerSec=5.7233598609995715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:13:41,492] [INFO] [timer.py:197:stop] 0/6124, RunningAvgSamplesPerSec=6.332999071187653, CurrSamplesPerSec=5.69576326408152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:13:52,896] [INFO] [timer.py:197:stop] 0/6126, RunningAvgSamplesPerSec=6.333000799392758, CurrSamplesPerSec=5.694567535841868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:14:04,138] [INFO] [timer.py:197:stop] 0/6128, RunningAvgSamplesPerSec=6.333013144982671, CurrSamplesPerSec=5.739862152879895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:14:15,464] [INFO] [timer.py:197:stop] 0/6130, RunningAvgSamplesPerSec=6.333011721516519, CurrSamplesPerSec=5.682956616907704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:14:26,778] [INFO] [timer.py:197:stop] 0/6132, RunningAvgSamplesPerSec=6.333017077437072, CurrSamplesPerSec=5.689464033835547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:14:38,143] [INFO] [timer.py:197:stop] 0/6134, RunningAvgSamplesPerSec=6.333023205191618, CurrSamplesPerSec=5.701417626140478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:14:46,638] [INFO] [timer.py:197:stop] 0/6136, RunningAvgSamplesPerSec=6.333530345771324, CurrSamplesPerSec=10.181979836122121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:14:57,900] [INFO] [timer.py:197:stop] 0/6138, RunningAvgSamplesPerSec=6.3335354485394, CurrSamplesPerSec=5.704494126868448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:15:09,204] [INFO] [logging.py:68:log_dist] [Rank 0] step=3070, skipped=5, lr=[4.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 07:15:09,205] [INFO] [timer.py:197:stop] 0/6140, RunningAvgSamplesPerSec=6.333539101452951, CurrSamplesPerSec=5.708758392510282, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:15:20,622] [INFO] [timer.py:197:stop] 0/6142, RunningAvgSamplesPerSec=6.333530860653627, CurrSamplesPerSec=5.651146410158156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:15:32,024] [INFO] [timer.py:197:stop] 0/6144, RunningAvgSamplesPerSec=6.333533492664965, CurrSamplesPerSec=5.692809654408539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:15:43,329] [INFO] [timer.py:197:stop] 0/6146, RunningAvgSamplesPerSec=6.333533025155378, CurrSamplesPerSec=5.6757631381319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:15:54,648] [INFO] [timer.py:197:stop] 0/6148, RunningAvgSamplesPerSec=6.333533683356346, CurrSamplesPerSec=5.689169332882951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:16:05,989] [INFO] [timer.py:197:stop] 0/6150, RunningAvgSamplesPerSec=6.333535642050272, CurrSamplesPerSec=5.696211429020808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 4.291111111111112e-06, 'epoch': 13.03} [2022-12-17 07:16:17,328] [INFO] [timer.py:197:stop] 0/6152, RunningAvgSamplesPerSec=6.333536210137741, CurrSamplesPerSec=5.689893358266075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:16:28,635] [INFO] [timer.py:197:stop] 0/6154, RunningAvgSamplesPerSec=6.333535803272801, CurrSamplesPerSec=5.704164169830742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:16:39,912] [INFO] [timer.py:197:stop] 0/6156, RunningAvgSamplesPerSec=6.333549464531601, CurrSamplesPerSec=5.733308802683141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:16:51,198] [INFO] [timer.py:197:stop] 0/6158, RunningAvgSamplesPerSec=6.333554253582397, CurrSamplesPerSec=5.711769171508369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:17:02,479] [INFO] [logging.py:68:log_dist] [Rank 0] step=3080, skipped=5, lr=[4.2800000000000005e-06], mom=[[0.9, 0.999]] [2022-12-17 07:17:02,481] [INFO] [timer.py:197:stop] 0/6160, RunningAvgSamplesPerSec=6.333556008679909, CurrSamplesPerSec=5.691028244421208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:17:13,718] [INFO] [timer.py:197:stop] 0/6162, RunningAvgSamplesPerSec=6.333570286672101, CurrSamplesPerSec=5.745796062654638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:17:25,031] [INFO] [timer.py:197:stop] 0/6164, RunningAvgSamplesPerSec=6.3335685820454595, CurrSamplesPerSec=5.7004892217010115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:17:36,288] [INFO] [timer.py:197:stop] 0/6166, RunningAvgSamplesPerSec=6.333575165962192, CurrSamplesPerSec=5.723583670512504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:17:47,560] [INFO] [timer.py:197:stop] 0/6168, RunningAvgSamplesPerSec=6.333579138734967, CurrSamplesPerSec=5.716975368116917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:17:58,863] [INFO] [timer.py:197:stop] 0/6170, RunningAvgSamplesPerSec=6.333576663658981, CurrSamplesPerSec=5.682725627333229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:18:10,131] [INFO] [timer.py:197:stop] 0/6172, RunningAvgSamplesPerSec=6.333581345778832, CurrSamplesPerSec=5.702568259797126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:18:21,401] [INFO] [timer.py:197:stop] 0/6174, RunningAvgSamplesPerSec=6.33358959330988, CurrSamplesPerSec=5.712314672746968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:18:32,668] [INFO] [timer.py:197:stop] 0/6176, RunningAvgSamplesPerSec=6.33359884866726, CurrSamplesPerSec=5.723382802499994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:18:43,931] [INFO] [timer.py:197:stop] 0/6178, RunningAvgSamplesPerSec=6.333606222769196, CurrSamplesPerSec=5.726166922511175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:18:55,195] [INFO] [logging.py:68:log_dist] [Rank 0] step=3090, skipped=5, lr=[4.257777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 07:18:55,196] [INFO] [timer.py:197:stop] 0/6180, RunningAvgSamplesPerSec=6.33361665814971, CurrSamplesPerSec=5.721152967814155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:19:06,466] [INFO] [timer.py:197:stop] 0/6182, RunningAvgSamplesPerSec=6.333620691197549, CurrSamplesPerSec=5.706928405749664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:19:17,733] [INFO] [timer.py:197:stop] 0/6184, RunningAvgSamplesPerSec=6.3336270853545695, CurrSamplesPerSec=5.704318112706603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:19:28,980] [INFO] [timer.py:197:stop] 0/6186, RunningAvgSamplesPerSec=6.333636018909267, CurrSamplesPerSec=5.719854901930869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:19:40,211] [INFO] [timer.py:197:stop] 0/6188, RunningAvgSamplesPerSec=6.3336476936987385, CurrSamplesPerSec=5.715806499597796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:19:51,493] [INFO] [timer.py:197:stop] 0/6190, RunningAvgSamplesPerSec=6.333649560930875, CurrSamplesPerSec=5.679310617680972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:20:02,757] [INFO] [timer.py:197:stop] 0/6192, RunningAvgSamplesPerSec=6.3336588565995875, CurrSamplesPerSec=5.72637311659954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:20:14,044] [INFO] [timer.py:197:stop] 0/6194, RunningAvgSamplesPerSec=6.333664712617201, CurrSamplesPerSec=5.721329778563483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:20:25,347] [INFO] [timer.py:197:stop] 0/6196, RunningAvgSamplesPerSec=6.333666722865129, CurrSamplesPerSec=5.703944300478775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:20:36,642] [INFO] [timer.py:197:stop] 0/6198, RunningAvgSamplesPerSec=6.333670839094798, CurrSamplesPerSec=5.695486278300802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:20:47,924] [INFO] [logging.py:68:log_dist] [Rank 0] step=3100, skipped=5, lr=[4.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 07:20:47,925] [INFO] [timer.py:197:stop] 0/6200, RunningAvgSamplesPerSec=6.333670497347703, CurrSamplesPerSec=5.6847792055710595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.235555555555556e-06, 'epoch': 13.14} [2022-12-17 07:20:59,185] [INFO] [timer.py:197:stop] 0/6202, RunningAvgSamplesPerSec=6.333676638307629, CurrSamplesPerSec=5.711012351730619, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:21:10,462] [INFO] [timer.py:197:stop] 0/6204, RunningAvgSamplesPerSec=6.333679227024554, CurrSamplesPerSec=5.718343997147511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:21:21,830] [INFO] [timer.py:197:stop] 0/6206, RunningAvgSamplesPerSec=6.333662934242267, CurrSamplesPerSec=5.706011547748035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:21:33,171] [INFO] [timer.py:197:stop] 0/6208, RunningAvgSamplesPerSec=6.333652218039571, CurrSamplesPerSec=5.6182565704722425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:21:44,461] [INFO] [timer.py:197:stop] 0/6210, RunningAvgSamplesPerSec=6.333653845696366, CurrSamplesPerSec=5.702601695675352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:21:55,746] [INFO] [timer.py:197:stop] 0/6212, RunningAvgSamplesPerSec=6.333656373763029, CurrSamplesPerSec=5.700723594705585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:22:07,010] [INFO] [timer.py:197:stop] 0/6214, RunningAvgSamplesPerSec=6.333661056272008, CurrSamplesPerSec=5.720521903529705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:22:18,295] [INFO] [timer.py:197:stop] 0/6216, RunningAvgSamplesPerSec=6.333658649564784, CurrSamplesPerSec=5.707987562653582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:22:29,582] [INFO] [timer.py:197:stop] 0/6218, RunningAvgSamplesPerSec=6.333662760426689, CurrSamplesPerSec=5.727622080813498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:22:40,929] [INFO] [logging.py:68:log_dist] [Rank 0] step=3110, skipped=5, lr=[4.213333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 07:22:40,930] [INFO] [timer.py:197:stop] 0/6220, RunningAvgSamplesPerSec=6.333655494934498, CurrSamplesPerSec=5.64942046159434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:22:52,198] [INFO] [timer.py:197:stop] 0/6222, RunningAvgSamplesPerSec=6.333661744484601, CurrSamplesPerSec=5.711456600556849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:23:03,481] [INFO] [timer.py:197:stop] 0/6224, RunningAvgSamplesPerSec=6.333668847335543, CurrSamplesPerSec=5.705415832894714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:23:14,768] [INFO] [timer.py:197:stop] 0/6226, RunningAvgSamplesPerSec=6.33367446236183, CurrSamplesPerSec=5.7123348514844405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:23:26,045] [INFO] [timer.py:197:stop] 0/6228, RunningAvgSamplesPerSec=6.333682561017407, CurrSamplesPerSec=5.717454398993657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:23:37,302] [INFO] [timer.py:197:stop] 0/6230, RunningAvgSamplesPerSec=6.333695018702657, CurrSamplesPerSec=5.726601804187436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:23:48,593] [INFO] [timer.py:197:stop] 0/6232, RunningAvgSamplesPerSec=6.333700599672269, CurrSamplesPerSec=5.713283656182034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:23:59,782] [INFO] [timer.py:197:stop] 0/6234, RunningAvgSamplesPerSec=6.333720792746628, CurrSamplesPerSec=5.753932606427263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:24:11,040] [INFO] [timer.py:197:stop] 0/6236, RunningAvgSamplesPerSec=6.3337316048765695, CurrSamplesPerSec=5.736263165164779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:24:22,293] [INFO] [timer.py:197:stop] 0/6238, RunningAvgSamplesPerSec=6.333743592965634, CurrSamplesPerSec=5.7273688716713105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:24:33,556] [INFO] [logging.py:68:log_dist] [Rank 0] step=3120, skipped=5, lr=[4.191111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 07:24:33,558] [INFO] [timer.py:197:stop] 0/6240, RunningAvgSamplesPerSec=6.333753619937605, CurrSamplesPerSec=5.723275174287382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:24:44,838] [INFO] [timer.py:197:stop] 0/6242, RunningAvgSamplesPerSec=6.333759832425309, CurrSamplesPerSec=5.707526620904189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:24:56,106] [INFO] [timer.py:197:stop] 0/6244, RunningAvgSamplesPerSec=6.333768310071723, CurrSamplesPerSec=5.713693475802618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:25:07,395] [INFO] [timer.py:197:stop] 0/6246, RunningAvgSamplesPerSec=6.333772564543033, CurrSamplesPerSec=5.690272809574608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:25:18,669] [INFO] [timer.py:197:stop] 0/6248, RunningAvgSamplesPerSec=6.333780481683763, CurrSamplesPerSec=5.726913347117072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:25:29,909] [INFO] [timer.py:197:stop] 0/6250, RunningAvgSamplesPerSec=6.333792384979605, CurrSamplesPerSec=5.730079078308308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.18e-06, 'epoch': 13.24} [2022-12-17 07:25:41,187] [INFO] [timer.py:197:stop] 0/6252, RunningAvgSamplesPerSec=6.3337997647527295, CurrSamplesPerSec=5.725025061804615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:25:52,455] [INFO] [timer.py:197:stop] 0/6254, RunningAvgSamplesPerSec=6.333808840927152, CurrSamplesPerSec=5.7160613651848236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:26:03,721] [INFO] [timer.py:197:stop] 0/6256, RunningAvgSamplesPerSec=6.333819140364993, CurrSamplesPerSec=5.741936852855393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:26:15,002] [INFO] [timer.py:197:stop] 0/6258, RunningAvgSamplesPerSec=6.3338262787236195, CurrSamplesPerSec=5.726260978399147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:26:26,272] [INFO] [logging.py:68:log_dist] [Rank 0] step=3130, skipped=5, lr=[4.168888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 07:26:26,274] [INFO] [timer.py:197:stop] 0/6260, RunningAvgSamplesPerSec=6.333834613812638, CurrSamplesPerSec=5.731404795279654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:26:37,516] [INFO] [timer.py:197:stop] 0/6262, RunningAvgSamplesPerSec=6.333851044396006, CurrSamplesPerSec=5.754929086728061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:26:48,838] [INFO] [timer.py:197:stop] 0/6264, RunningAvgSamplesPerSec=6.333848800608768, CurrSamplesPerSec=5.688091595166303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:27:00,106] [INFO] [timer.py:197:stop] 0/6266, RunningAvgSamplesPerSec=6.333856453734544, CurrSamplesPerSec=5.708961877763876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:27:11,364] [INFO] [timer.py:197:stop] 0/6268, RunningAvgSamplesPerSec=6.333866410344925, CurrSamplesPerSec=5.719594092005034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:27:22,619] [INFO] [timer.py:197:stop] 0/6270, RunningAvgSamplesPerSec=6.333876826821105, CurrSamplesPerSec=5.723777473914621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:27:33,875] [INFO] [timer.py:197:stop] 0/6272, RunningAvgSamplesPerSec=6.3338848330656905, CurrSamplesPerSec=5.716193797156096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:27:45,120] [INFO] [timer.py:197:stop] 0/6274, RunningAvgSamplesPerSec=6.333897185918975, CurrSamplesPerSec=5.74382648026717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:27:56,403] [INFO] [timer.py:197:stop] 0/6276, RunningAvgSamplesPerSec=6.333902031352192, CurrSamplesPerSec=5.709184319767375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:28:07,662] [INFO] [timer.py:197:stop] 0/6278, RunningAvgSamplesPerSec=6.333911149775862, CurrSamplesPerSec=5.7330222760859595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:28:18,945] [INFO] [logging.py:68:log_dist] [Rank 0] step=3140, skipped=5, lr=[4.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 07:28:18,947] [INFO] [timer.py:197:stop] 0/6280, RunningAvgSamplesPerSec=6.33391300068281, CurrSamplesPerSec=5.699917897200058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:28:30,221] [INFO] [timer.py:197:stop] 0/6282, RunningAvgSamplesPerSec=6.333919192214186, CurrSamplesPerSec=5.7230972671385505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:28:41,446] [INFO] [timer.py:197:stop] 0/6284, RunningAvgSamplesPerSec=6.333935240619337, CurrSamplesPerSec=5.736468370782495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:28:52,726] [INFO] [timer.py:197:stop] 0/6286, RunningAvgSamplesPerSec=6.333940726797692, CurrSamplesPerSec=5.702230046434748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:29:04,007] [INFO] [timer.py:197:stop] 0/6288, RunningAvgSamplesPerSec=6.333946524808699, CurrSamplesPerSec=5.7043353257336005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:29:15,255] [INFO] [timer.py:197:stop] 0/6290, RunningAvgSamplesPerSec=6.3339564607143695, CurrSamplesPerSec=5.712767148317302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:29:26,466] [INFO] [timer.py:197:stop] 0/6292, RunningAvgSamplesPerSec=6.333969665950145, CurrSamplesPerSec=5.738719976979741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:29:37,700] [INFO] [timer.py:197:stop] 0/6294, RunningAvgSamplesPerSec=6.333981120917072, CurrSamplesPerSec=5.711757018032768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:29:48,954] [INFO] [timer.py:197:stop] 0/6296, RunningAvgSamplesPerSec=6.333993811173233, CurrSamplesPerSec=5.731998850847594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:30:00,190] [INFO] [timer.py:197:stop] 0/6298, RunningAvgSamplesPerSec=6.334008001142226, CurrSamplesPerSec=5.7244090125781755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:30:11,437] [INFO] [logging.py:68:log_dist] [Rank 0] step=3150, skipped=5, lr=[4.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 07:30:11,439] [INFO] [timer.py:197:stop] 0/6300, RunningAvgSamplesPerSec=6.334020339685583, CurrSamplesPerSec=5.74459939878852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.124444444444445e-06, 'epoch': 13.35} [2022-12-17 07:30:22,694] [INFO] [timer.py:197:stop] 0/6302, RunningAvgSamplesPerSec=6.334030713722951, CurrSamplesPerSec=5.7419351333448185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:30:33,920] [INFO] [timer.py:197:stop] 0/6304, RunningAvgSamplesPerSec=6.334044649015982, CurrSamplesPerSec=5.722702201516716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:30:45,181] [INFO] [timer.py:197:stop] 0/6306, RunningAvgSamplesPerSec=6.334050139124009, CurrSamplesPerSec=5.702820249985925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:30:56,440] [INFO] [timer.py:197:stop] 0/6308, RunningAvgSamplesPerSec=6.3340591029131605, CurrSamplesPerSec=5.72185320243758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:31:07,721] [INFO] [timer.py:197:stop] 0/6310, RunningAvgSamplesPerSec=6.334064661655033, CurrSamplesPerSec=5.72226888775065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:31:19,006] [INFO] [timer.py:197:stop] 0/6312, RunningAvgSamplesPerSec=6.334071341269889, CurrSamplesPerSec=5.709138664317977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:31:30,271] [INFO] [timer.py:197:stop] 0/6314, RunningAvgSamplesPerSec=6.334080005450078, CurrSamplesPerSec=5.70330321265174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:31:41,558] [INFO] [timer.py:197:stop] 0/6316, RunningAvgSamplesPerSec=6.334084927353801, CurrSamplesPerSec=5.725893078676519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:31:52,823] [INFO] [timer.py:197:stop] 0/6318, RunningAvgSamplesPerSec=6.334092224371551, CurrSamplesPerSec=5.735803528437858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:32:04,092] [INFO] [logging.py:68:log_dist] [Rank 0] step=3160, skipped=5, lr=[4.102222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 07:32:04,094] [INFO] [timer.py:197:stop] 0/6320, RunningAvgSamplesPerSec=6.3340985988673655, CurrSamplesPerSec=5.723873648233309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:32:15,331] [INFO] [timer.py:197:stop] 0/6322, RunningAvgSamplesPerSec=6.334108187268311, CurrSamplesPerSec=5.730364332160457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:32:26,579] [INFO] [timer.py:197:stop] 0/6324, RunningAvgSamplesPerSec=6.334119480956959, CurrSamplesPerSec=5.7376999128728645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:32:37,841] [INFO] [timer.py:197:stop] 0/6326, RunningAvgSamplesPerSec=6.334127634300717, CurrSamplesPerSec=5.735759897419527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:32:49,119] [INFO] [timer.py:197:stop] 0/6328, RunningAvgSamplesPerSec=6.3341327590317595, CurrSamplesPerSec=5.711272865637329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:33:00,383] [INFO] [timer.py:197:stop] 0/6330, RunningAvgSamplesPerSec=6.334137526524288, CurrSamplesPerSec=5.701190703342258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:33:11,682] [INFO] [timer.py:197:stop] 0/6332, RunningAvgSamplesPerSec=6.334138129547782, CurrSamplesPerSec=5.707132003239794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:33:22,984] [INFO] [timer.py:197:stop] 0/6334, RunningAvgSamplesPerSec=6.334138679363636, CurrSamplesPerSec=5.695152529306642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:33:34,221] [INFO] [timer.py:197:stop] 0/6336, RunningAvgSamplesPerSec=6.334148557977573, CurrSamplesPerSec=5.72934625780531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:33:45,505] [INFO] [timer.py:197:stop] 0/6338, RunningAvgSamplesPerSec=6.33415256108004, CurrSamplesPerSec=5.696907263841009, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:33:56,784] [INFO] [logging.py:68:log_dist] [Rank 0] step=3170, skipped=5, lr=[4.08e-06], mom=[[0.9, 0.999]] [2022-12-17 07:33:56,786] [INFO] [timer.py:197:stop] 0/6340, RunningAvgSamplesPerSec=6.334156801115452, CurrSamplesPerSec=5.69057727563347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:34:08,066] [INFO] [timer.py:197:stop] 0/6342, RunningAvgSamplesPerSec=6.3341588107996945, CurrSamplesPerSec=5.689759247721567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:34:19,371] [INFO] [timer.py:197:stop] 0/6344, RunningAvgSamplesPerSec=6.3341594637141885, CurrSamplesPerSec=5.699542483160544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:34:30,652] [INFO] [timer.py:197:stop] 0/6346, RunningAvgSamplesPerSec=6.334164233327315, CurrSamplesPerSec=5.720801817491398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:34:41,916] [INFO] [timer.py:197:stop] 0/6348, RunningAvgSamplesPerSec=6.334172213651666, CurrSamplesPerSec=5.717038438768021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:34:53,200] [INFO] [timer.py:197:stop] 0/6350, RunningAvgSamplesPerSec=6.334176607542711, CurrSamplesPerSec=5.696459231174649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.0688888888888896e-06, 'epoch': 13.45} [2022-12-17 07:35:04,420] [INFO] [timer.py:197:stop] 0/6352, RunningAvgSamplesPerSec=6.334190474942335, CurrSamplesPerSec=5.750339362309045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:35:15,687] [INFO] [timer.py:197:stop] 0/6354, RunningAvgSamplesPerSec=6.334197592289525, CurrSamplesPerSec=5.708686763325248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:35:26,965] [INFO] [timer.py:197:stop] 0/6356, RunningAvgSamplesPerSec=6.334202876382074, CurrSamplesPerSec=5.696194506733333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:35:38,242] [INFO] [timer.py:197:stop] 0/6358, RunningAvgSamplesPerSec=6.334208646005777, CurrSamplesPerSec=5.726033539458956, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:35:49,490] [INFO] [logging.py:68:log_dist] [Rank 0] step=3180, skipped=5, lr=[4.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 07:35:49,491] [INFO] [timer.py:197:stop] 0/6360, RunningAvgSamplesPerSec=6.334213859842294, CurrSamplesPerSec=5.705084314216812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:36:00,801] [INFO] [timer.py:197:stop] 0/6362, RunningAvgSamplesPerSec=6.334213061028189, CurrSamplesPerSec=5.685610734607999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:36:12,079] [INFO] [timer.py:197:stop] 0/6364, RunningAvgSamplesPerSec=6.334218711653675, CurrSamplesPerSec=5.71221475362965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:36:23,314] [INFO] [timer.py:197:stop] 0/6366, RunningAvgSamplesPerSec=6.334226486091336, CurrSamplesPerSec=5.7237674661127995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:36:34,618] [INFO] [timer.py:197:stop] 0/6368, RunningAvgSamplesPerSec=6.334227102289113, CurrSamplesPerSec=5.70363379722427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:36:45,904] [INFO] [timer.py:197:stop] 0/6370, RunningAvgSamplesPerSec=6.33423109422309, CurrSamplesPerSec=5.699825672921633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:36:57,178] [INFO] [timer.py:197:stop] 0/6372, RunningAvgSamplesPerSec=6.334237409680017, CurrSamplesPerSec=5.714431056001904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:37:08,435] [INFO] [timer.py:197:stop] 0/6374, RunningAvgSamplesPerSec=6.3342470259604955, CurrSamplesPerSec=5.737647422993051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:37:19,668] [INFO] [timer.py:197:stop] 0/6376, RunningAvgSamplesPerSec=6.334258378697692, CurrSamplesPerSec=5.740801708702291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:37:30,986] [INFO] [timer.py:197:stop] 0/6378, RunningAvgSamplesPerSec=6.334256167928714, CurrSamplesPerSec=5.688993057462939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:37:42,214] [INFO] [logging.py:68:log_dist] [Rank 0] step=3190, skipped=5, lr=[4.035555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 07:37:42,215] [INFO] [timer.py:197:stop] 0/6380, RunningAvgSamplesPerSec=6.33426873374771, CurrSamplesPerSec=5.7413142117213685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:37:53,484] [INFO] [timer.py:197:stop] 0/6382, RunningAvgSamplesPerSec=6.334276375997644, CurrSamplesPerSec=5.715985414323117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:38:04,779] [INFO] [timer.py:197:stop] 0/6384, RunningAvgSamplesPerSec=6.3342791658799245, CurrSamplesPerSec=5.701290963695899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:38:16,060] [INFO] [timer.py:197:stop] 0/6386, RunningAvgSamplesPerSec=6.334284560101654, CurrSamplesPerSec=5.7080103810826985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:38:27,421] [INFO] [timer.py:197:stop] 0/6388, RunningAvgSamplesPerSec=6.334288852463872, CurrSamplesPerSec=5.715398323846543, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:38:38,696] [INFO] [timer.py:197:stop] 0/6390, RunningAvgSamplesPerSec=6.334293701092073, CurrSamplesPerSec=5.723066274843333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:38:49,974] [INFO] [timer.py:197:stop] 0/6392, RunningAvgSamplesPerSec=6.334300172161752, CurrSamplesPerSec=5.714413052107243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:39:01,254] [INFO] [timer.py:197:stop] 0/6394, RunningAvgSamplesPerSec=6.33430533183457, CurrSamplesPerSec=5.714190689032285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:39:12,552] [INFO] [timer.py:197:stop] 0/6396, RunningAvgSamplesPerSec=6.3343093155646, CurrSamplesPerSec=5.701755500915049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:39:23,875] [INFO] [timer.py:197:stop] 0/6398, RunningAvgSamplesPerSec=6.334306969316176, CurrSamplesPerSec=5.690710218338804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:39:35,165] [INFO] [logging.py:68:log_dist] [Rank 0] step=3200, skipped=5, lr=[4.013333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 07:39:35,166] [INFO] [timer.py:197:stop] 0/6400, RunningAvgSamplesPerSec=6.33431126623047, CurrSamplesPerSec=5.694712746053772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.013333333333334e-06, 'epoch': 13.56} [2022-12-17 07:39:46,431] [INFO] [timer.py:197:stop] 0/6402, RunningAvgSamplesPerSec=6.3343197508186675, CurrSamplesPerSec=5.718162254982803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:39:57,734] [INFO] [timer.py:197:stop] 0/6404, RunningAvgSamplesPerSec=6.334320762396656, CurrSamplesPerSec=5.687556494617642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:40:09,051] [INFO] [timer.py:197:stop] 0/6406, RunningAvgSamplesPerSec=6.334318964649175, CurrSamplesPerSec=5.701802976134947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:40:20,325] [INFO] [timer.py:197:stop] 0/6408, RunningAvgSamplesPerSec=6.334322601846689, CurrSamplesPerSec=5.715647311422073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:40:31,575] [INFO] [timer.py:197:stop] 0/6410, RunningAvgSamplesPerSec=6.334331751212934, CurrSamplesPerSec=5.721118338574796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:40:42,836] [INFO] [timer.py:197:stop] 0/6412, RunningAvgSamplesPerSec=6.334342953194776, CurrSamplesPerSec=5.733783716239972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:40:54,089] [INFO] [timer.py:197:stop] 0/6414, RunningAvgSamplesPerSec=6.334353856926857, CurrSamplesPerSec=5.734053660919016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:41:05,379] [INFO] [timer.py:197:stop] 0/6416, RunningAvgSamplesPerSec=6.334357538593181, CurrSamplesPerSec=5.709000245343994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:41:16,633] [INFO] [timer.py:197:stop] 0/6418, RunningAvgSamplesPerSec=6.334366775817005, CurrSamplesPerSec=5.733293618503807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:41:27,882] [INFO] [logging.py:68:log_dist] [Rank 0] step=3210, skipped=5, lr=[3.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 07:41:27,884] [INFO] [timer.py:197:stop] 0/6420, RunningAvgSamplesPerSec=6.334374605304082, CurrSamplesPerSec=5.72185320243758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:41:39,158] [INFO] [timer.py:197:stop] 0/6422, RunningAvgSamplesPerSec=6.334383346014213, CurrSamplesPerSec=5.7169899789547145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:41:50,454] [INFO] [timer.py:197:stop] 0/6424, RunningAvgSamplesPerSec=6.334385831502931, CurrSamplesPerSec=5.7094944561622984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:42:01,719] [INFO] [timer.py:197:stop] 0/6426, RunningAvgSamplesPerSec=6.33439479212694, CurrSamplesPerSec=5.7225472647897115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:42:12,981] [INFO] [timer.py:197:stop] 0/6428, RunningAvgSamplesPerSec=6.3344037562703885, CurrSamplesPerSec=5.7429112440335315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:42:24,242] [INFO] [timer.py:197:stop] 0/6430, RunningAvgSamplesPerSec=6.334409786537378, CurrSamplesPerSec=5.708665153517289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:42:35,613] [INFO] [timer.py:197:stop] 0/6432, RunningAvgSamplesPerSec=6.334397375003188, CurrSamplesPerSec=5.613414707336939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:42:46,882] [INFO] [timer.py:197:stop] 0/6434, RunningAvgSamplesPerSec=6.3344080132693055, CurrSamplesPerSec=5.7186039621197216, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:42:58,199] [INFO] [timer.py:197:stop] 0/6436, RunningAvgSamplesPerSec=6.334413883868553, CurrSamplesPerSec=5.711313208535354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:43:09,484] [INFO] [timer.py:197:stop] 0/6438, RunningAvgSamplesPerSec=6.334418881235654, CurrSamplesPerSec=5.697821198138527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:43:20,807] [INFO] [logging.py:68:log_dist] [Rank 0] step=3220, skipped=5, lr=[3.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 07:43:20,808] [INFO] [timer.py:197:stop] 0/6440, RunningAvgSamplesPerSec=6.334424521474871, CurrSamplesPerSec=5.701281760907762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:43:32,071] [INFO] [timer.py:197:stop] 0/6442, RunningAvgSamplesPerSec=6.334429620798979, CurrSamplesPerSec=5.703727114321208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:43:43,375] [INFO] [timer.py:197:stop] 0/6444, RunningAvgSamplesPerSec=6.33442966271726, CurrSamplesPerSec=5.703901395205888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:43:54,693] [INFO] [timer.py:197:stop] 0/6446, RunningAvgSamplesPerSec=6.334435735573179, CurrSamplesPerSec=5.708214784084248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:44:05,979] [INFO] [timer.py:197:stop] 0/6448, RunningAvgSamplesPerSec=6.334439283399943, CurrSamplesPerSec=5.710361899779623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:44:17,256] [INFO] [timer.py:197:stop] 0/6450, RunningAvgSamplesPerSec=6.334443402553889, CurrSamplesPerSec=5.689446186919232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 3.9577777777777785e-06, 'epoch': 13.67} [2022-12-17 07:44:28,529] [INFO] [timer.py:197:stop] 0/6452, RunningAvgSamplesPerSec=6.334449873358714, CurrSamplesPerSec=5.7057466624302915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:44:39,835] [INFO] [timer.py:197:stop] 0/6454, RunningAvgSamplesPerSec=6.334449871263617, CurrSamplesPerSec=5.665630778327832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:44:51,094] [INFO] [timer.py:197:stop] 0/6456, RunningAvgSamplesPerSec=6.33445870308113, CurrSamplesPerSec=5.711911371015564, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:45:02,323] [INFO] [timer.py:197:stop] 0/6458, RunningAvgSamplesPerSec=6.33447117020378, CurrSamplesPerSec=5.742178085831228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:45:13,566] [INFO] [logging.py:68:log_dist] [Rank 0] step=3230, skipped=5, lr=[3.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 07:45:13,567] [INFO] [timer.py:197:stop] 0/6460, RunningAvgSamplesPerSec=6.334483320833476, CurrSamplesPerSec=5.7602422312557096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:45:24,855] [INFO] [timer.py:197:stop] 0/6462, RunningAvgSamplesPerSec=6.334487090458282, CurrSamplesPerSec=5.703172588988866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:45:36,144] [INFO] [timer.py:197:stop] 0/6464, RunningAvgSamplesPerSec=6.334490368647347, CurrSamplesPerSec=5.711960717114719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:45:47,420] [INFO] [timer.py:197:stop] 0/6466, RunningAvgSamplesPerSec=6.334496714801158, CurrSamplesPerSec=5.722351836883863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:45:58,691] [INFO] [timer.py:197:stop] 0/6468, RunningAvgSamplesPerSec=6.334500277246964, CurrSamplesPerSec=5.710934347992327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:46:10,029] [INFO] [timer.py:197:stop] 0/6470, RunningAvgSamplesPerSec=6.33449421971024, CurrSamplesPerSec=5.6767440094893145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:46:21,319] [INFO] [timer.py:197:stop] 0/6472, RunningAvgSamplesPerSec=6.334497096531263, CurrSamplesPerSec=5.714800890829148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:46:32,622] [INFO] [timer.py:197:stop] 0/6474, RunningAvgSamplesPerSec=6.3345016732355335, CurrSamplesPerSec=5.739231617535441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:46:43,911] [INFO] [timer.py:197:stop] 0/6476, RunningAvgSamplesPerSec=6.3345069214220695, CurrSamplesPerSec=5.722443327687395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:46:55,203] [INFO] [timer.py:197:stop] 0/6478, RunningAvgSamplesPerSec=6.334509599437047, CurrSamplesPerSec=5.711947833568165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:47:06,498] [INFO] [logging.py:68:log_dist] [Rank 0] step=3240, skipped=5, lr=[3.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 07:47:06,499] [INFO] [timer.py:197:stop] 0/6480, RunningAvgSamplesPerSec=6.334507862901814, CurrSamplesPerSec=5.703892911188816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:47:17,846] [INFO] [timer.py:197:stop] 0/6482, RunningAvgSamplesPerSec=6.334500014738551, CurrSamplesPerSec=5.675805381111495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:47:29,169] [INFO] [timer.py:197:stop] 0/6484, RunningAvgSamplesPerSec=6.33449821623731, CurrSamplesPerSec=5.6873121173197685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:47:40,463] [INFO] [timer.py:197:stop] 0/6486, RunningAvgSamplesPerSec=6.334501226706766, CurrSamplesPerSec=5.689940877367955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:47:51,756] [INFO] [timer.py:197:stop] 0/6488, RunningAvgSamplesPerSec=6.334501235383202, CurrSamplesPerSec=5.696078470899154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:48:03,015] [INFO] [timer.py:197:stop] 0/6490, RunningAvgSamplesPerSec=6.334510794702494, CurrSamplesPerSec=5.733341375468196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:48:14,308] [INFO] [timer.py:197:stop] 0/6492, RunningAvgSamplesPerSec=6.334513328568358, CurrSamplesPerSec=5.700342748359202, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:48:25,572] [INFO] [timer.py:197:stop] 0/6494, RunningAvgSamplesPerSec=6.334521013057932, CurrSamplesPerSec=5.73964688643047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:48:36,884] [INFO] [timer.py:197:stop] 0/6496, RunningAvgSamplesPerSec=6.3345225257764195, CurrSamplesPerSec=5.704793084901866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:48:48,174] [INFO] [timer.py:197:stop] 0/6498, RunningAvgSamplesPerSec=6.334527174656914, CurrSamplesPerSec=5.710437458312825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:48:59,439] [INFO] [logging.py:68:log_dist] [Rank 0] step=3250, skipped=5, lr=[3.9022222222222225e-06], mom=[[0.9, 0.999]] [2022-12-17 07:48:59,441] [INFO] [timer.py:197:stop] 0/6500, RunningAvgSamplesPerSec=6.334536038481936, CurrSamplesPerSec=5.748160360542781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.9022222222222225e-06, 'epoch': 13.77} [2022-12-17 07:49:10,677] [INFO] [timer.py:197:stop] 0/6502, RunningAvgSamplesPerSec=6.334544401376434, CurrSamplesPerSec=5.718898065329056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:49:21,950] [INFO] [timer.py:197:stop] 0/6504, RunningAvgSamplesPerSec=6.334551849421232, CurrSamplesPerSec=5.719225098781024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:49:33,208] [INFO] [timer.py:197:stop] 0/6506, RunningAvgSamplesPerSec=6.334561672364955, CurrSamplesPerSec=5.728841513197815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:49:44,451] [INFO] [timer.py:197:stop] 0/6508, RunningAvgSamplesPerSec=6.334570523742911, CurrSamplesPerSec=5.724698585593349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:49:55,715] [INFO] [timer.py:197:stop] 0/6510, RunningAvgSamplesPerSec=6.3345792677214945, CurrSamplesPerSec=5.7236586030173005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:50:06,968] [INFO] [timer.py:197:stop] 0/6512, RunningAvgSamplesPerSec=6.334590058357528, CurrSamplesPerSec=5.7514309615628045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:50:18,211] [INFO] [timer.py:197:stop] 0/6514, RunningAvgSamplesPerSec=6.334603074445066, CurrSamplesPerSec=5.724986722728296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:50:29,481] [INFO] [timer.py:197:stop] 0/6516, RunningAvgSamplesPerSec=6.334610934163331, CurrSamplesPerSec=5.715269822617539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:50:40,752] [INFO] [timer.py:197:stop] 0/6518, RunningAvgSamplesPerSec=6.33461945488174, CurrSamplesPerSec=5.720193746608345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:50:52,044] [INFO] [logging.py:68:log_dist] [Rank 0] step=3260, skipped=5, lr=[3.88e-06], mom=[[0.9, 0.999]] [2022-12-17 07:50:52,046] [INFO] [timer.py:197:stop] 0/6520, RunningAvgSamplesPerSec=6.334621780396799, CurrSamplesPerSec=5.6912591854839985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:51:03,283] [INFO] [timer.py:197:stop] 0/6522, RunningAvgSamplesPerSec=6.334632276362033, CurrSamplesPerSec=5.730913880609149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:51:14,584] [INFO] [timer.py:197:stop] 0/6524, RunningAvgSamplesPerSec=6.334633506617112, CurrSamplesPerSec=5.701871525934077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:51:25,919] [INFO] [timer.py:197:stop] 0/6526, RunningAvgSamplesPerSec=6.334625725853474, CurrSamplesPerSec=5.659491053787917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:51:37,204] [INFO] [timer.py:197:stop] 0/6528, RunningAvgSamplesPerSec=6.334630479939639, CurrSamplesPerSec=5.708790201300999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:51:48,768] [INFO] [timer.py:197:stop] 0/6530, RunningAvgSamplesPerSec=6.334633340870514, CurrSamplesPerSec=5.693425200013472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:52:00,152] [INFO] [timer.py:197:stop] 0/6532, RunningAvgSamplesPerSec=6.334636467388313, CurrSamplesPerSec=5.699997536839119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:52:11,436] [INFO] [timer.py:197:stop] 0/6534, RunningAvgSamplesPerSec=6.334641379479118, CurrSamplesPerSec=5.7103485375461345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:52:22,712] [INFO] [timer.py:197:stop] 0/6536, RunningAvgSamplesPerSec=6.334648530792128, CurrSamplesPerSec=5.719040620022333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:52:33,970] [INFO] [timer.py:197:stop] 0/6538, RunningAvgSamplesPerSec=6.334656893317896, CurrSamplesPerSec=5.719474176171054, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:52:45,257] [INFO] [logging.py:68:log_dist] [Rank 0] step=3270, skipped=5, lr=[3.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 07:52:45,258] [INFO] [timer.py:197:stop] 0/6540, RunningAvgSamplesPerSec=6.334657147519695, CurrSamplesPerSec=5.688442357582676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:52:56,557] [INFO] [timer.py:197:stop] 0/6542, RunningAvgSamplesPerSec=6.334659911914299, CurrSamplesPerSec=5.699580966270709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:53:07,861] [INFO] [timer.py:197:stop] 0/6544, RunningAvgSamplesPerSec=6.334661752510299, CurrSamplesPerSec=5.7100145020316795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:53:19,167] [INFO] [timer.py:197:stop] 0/6546, RunningAvgSamplesPerSec=6.334660035834661, CurrSamplesPerSec=5.6919149478993525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:53:30,477] [INFO] [timer.py:197:stop] 0/6548, RunningAvgSamplesPerSec=6.334660203676872, CurrSamplesPerSec=5.688878761592103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:53:41,778] [INFO] [timer.py:197:stop] 0/6550, RunningAvgSamplesPerSec=6.334663627328773, CurrSamplesPerSec=5.694641468901246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 3.8466666666666665e-06, 'epoch': 13.88} [2022-12-17 07:53:53,070] [INFO] [timer.py:197:stop] 0/6552, RunningAvgSamplesPerSec=6.334667550444578, CurrSamplesPerSec=5.712499933497748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:54:04,382] [INFO] [timer.py:197:stop] 0/6554, RunningAvgSamplesPerSec=6.334666729151781, CurrSamplesPerSec=5.688070140999738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:54:15,653] [INFO] [timer.py:197:stop] 0/6556, RunningAvgSamplesPerSec=6.334670797917401, CurrSamplesPerSec=5.713245960593941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:54:26,951] [INFO] [timer.py:197:stop] 0/6558, RunningAvgSamplesPerSec=6.334672746645493, CurrSamplesPerSec=5.701442329605782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:54:38,234] [INFO] [logging.py:68:log_dist] [Rank 0] step=3280, skipped=5, lr=[3.835555555555555e-06], mom=[[0.9, 0.999]] [2022-12-17 07:54:38,236] [INFO] [timer.py:197:stop] 0/6560, RunningAvgSamplesPerSec=6.334678289214605, CurrSamplesPerSec=5.713492571716677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:54:49,540] [INFO] [timer.py:197:stop] 0/6562, RunningAvgSamplesPerSec=6.334680555512083, CurrSamplesPerSec=5.716451618907167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:55:00,820] [INFO] [timer.py:197:stop] 0/6564, RunningAvgSamplesPerSec=6.334688242733154, CurrSamplesPerSec=5.713890258754805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:55:12,112] [INFO] [timer.py:197:stop] 0/6566, RunningAvgSamplesPerSec=6.334691767360219, CurrSamplesPerSec=5.691641956668833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:55:23,390] [INFO] [timer.py:197:stop] 0/6568, RunningAvgSamplesPerSec=6.334699487992292, CurrSamplesPerSec=5.71009806797846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:55:34,687] [INFO] [timer.py:197:stop] 0/6570, RunningAvgSamplesPerSec=6.334702403865535, CurrSamplesPerSec=5.699421712497568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:55:45,980] [INFO] [timer.py:197:stop] 0/6572, RunningAvgSamplesPerSec=6.334709420601028, CurrSamplesPerSec=5.705565720115627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:55:57,281] [INFO] [timer.py:197:stop] 0/6574, RunningAvgSamplesPerSec=6.334713597124701, CurrSamplesPerSec=5.721482942087314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:56:08,612] [INFO] [timer.py:197:stop] 0/6576, RunningAvgSamplesPerSec=6.3347120968318675, CurrSamplesPerSec=5.698075429921804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:56:19,883] [INFO] [timer.py:197:stop] 0/6578, RunningAvgSamplesPerSec=6.334716724978661, CurrSamplesPerSec=5.717221327172613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:56:31,166] [INFO] [logging.py:68:log_dist] [Rank 0] step=3290, skipped=5, lr=[3.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 07:56:31,168] [INFO] [timer.py:197:stop] 0/6580, RunningAvgSamplesPerSec=6.334719108089491, CurrSamplesPerSec=5.703603015371135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:56:42,501] [INFO] [timer.py:197:stop] 0/6582, RunningAvgSamplesPerSec=6.334714513118618, CurrSamplesPerSec=5.671661365575104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:56:53,787] [INFO] [timer.py:197:stop] 0/6584, RunningAvgSamplesPerSec=6.3347188763533175, CurrSamplesPerSec=5.7126526246192935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:57:05,046] [INFO] [timer.py:197:stop] 0/6586, RunningAvgSamplesPerSec=6.334725509508845, CurrSamplesPerSec=5.709292875815929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:57:16,350] [INFO] [timer.py:197:stop] 0/6588, RunningAvgSamplesPerSec=6.334726599341239, CurrSamplesPerSec=5.693161964928307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:57:27,660] [INFO] [timer.py:197:stop] 0/6590, RunningAvgSamplesPerSec=6.334723566293667, CurrSamplesPerSec=5.688183440113247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:57:38,951] [INFO] [timer.py:197:stop] 0/6592, RunningAvgSamplesPerSec=6.334723994163866, CurrSamplesPerSec=5.709301861624632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:57:50,223] [INFO] [timer.py:197:stop] 0/6594, RunningAvgSamplesPerSec=6.334728347281103, CurrSamplesPerSec=5.715075621498381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:58:01,518] [INFO] [timer.py:197:stop] 0/6596, RunningAvgSamplesPerSec=6.334731330666477, CurrSamplesPerSec=5.703867701687004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:58:12,859] [INFO] [timer.py:197:stop] 0/6598, RunningAvgSamplesPerSec=6.334727303189047, CurrSamplesPerSec=5.682588726750454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:58:24,162] [INFO] [logging.py:68:log_dist] [Rank 0] step=3300, skipped=5, lr=[3.7911111111111114e-06], mom=[[0.9, 0.999]] [2022-12-17 07:58:24,163] [INFO] [timer.py:197:stop] 0/6600, RunningAvgSamplesPerSec=6.334726599953564, CurrSamplesPerSec=5.699277714088104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.7911111111111114e-06, 'epoch': 13.98} [2022-12-17 07:58:35,460] [INFO] [timer.py:197:stop] 0/6602, RunningAvgSamplesPerSec=6.334731408166863, CurrSamplesPerSec=5.712972378490019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:58:46,758] [INFO] [timer.py:197:stop] 0/6604, RunningAvgSamplesPerSec=6.334731845338654, CurrSamplesPerSec=5.707407938636204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:58:58,047] [INFO] [timer.py:197:stop] 0/6606, RunningAvgSamplesPerSec=6.334734233740369, CurrSamplesPerSec=5.711525382674848, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:59:06,507] [INFO] [timer.py:197:stop] 0/6608, RunningAvgSamplesPerSec=6.335210928607283, CurrSamplesPerSec=10.252419732705375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:59:17,751] [INFO] [timer.py:197:stop] 0/6610, RunningAvgSamplesPerSec=6.335219406395689, CurrSamplesPerSec=5.717978088381363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:59:29,040] [INFO] [timer.py:197:stop] 0/6612, RunningAvgSamplesPerSec=6.335223829599127, CurrSamplesPerSec=5.703502916051277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:59:40,379] [INFO] [timer.py:197:stop] 0/6614, RunningAvgSamplesPerSec=6.335217971742134, CurrSamplesPerSec=5.668044202371438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 07:59:51,657] [INFO] [timer.py:197:stop] 0/6616, RunningAvgSamplesPerSec=6.335221810332066, CurrSamplesPerSec=5.710716629205853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:00:02,978] [INFO] [timer.py:197:stop] 0/6618, RunningAvgSamplesPerSec=6.335221454769381, CurrSamplesPerSec=5.683671119148693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:00:14,226] [INFO] [logging.py:68:log_dist] [Rank 0] step=3310, skipped=5, lr=[3.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 08:00:14,227] [INFO] [timer.py:197:stop] 0/6620, RunningAvgSamplesPerSec=6.335230178121693, CurrSamplesPerSec=5.728086274241019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:00:25,541] [INFO] [timer.py:197:stop] 0/6622, RunningAvgSamplesPerSec=6.335231508364938, CurrSamplesPerSec=5.714036699364634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:00:36,864] [INFO] [timer.py:197:stop] 0/6624, RunningAvgSamplesPerSec=6.335230673922582, CurrSamplesPerSec=5.67315585182068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:00:48,140] [INFO] [timer.py:197:stop] 0/6626, RunningAvgSamplesPerSec=6.3352321982779465, CurrSamplesPerSec=5.700755314020435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:00:59,469] [INFO] [timer.py:197:stop] 0/6628, RunningAvgSamplesPerSec=6.3352286469104575, CurrSamplesPerSec=5.684976409828763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:01:10,759] [INFO] [timer.py:197:stop] 0/6630, RunningAvgSamplesPerSec=6.3352306933118365, CurrSamplesPerSec=5.70521526797899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:01:22,014] [INFO] [timer.py:197:stop] 0/6632, RunningAvgSamplesPerSec=6.335241592226106, CurrSamplesPerSec=5.721563185487121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:01:33,293] [INFO] [timer.py:197:stop] 0/6634, RunningAvgSamplesPerSec=6.335247727416611, CurrSamplesPerSec=5.7200123743921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:01:44,582] [INFO] [timer.py:197:stop] 0/6636, RunningAvgSamplesPerSec=6.335252259272921, CurrSamplesPerSec=5.712915476704088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:01:55,896] [INFO] [timer.py:197:stop] 0/6638, RunningAvgSamplesPerSec=6.335255563434, CurrSamplesPerSec=5.712214267413781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:02:07,209] [INFO] [logging.py:68:log_dist] [Rank 0] step=3320, skipped=5, lr=[3.7466666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 08:02:07,210] [INFO] [timer.py:197:stop] 0/6640, RunningAvgSamplesPerSec=6.335256820084927, CurrSamplesPerSec=5.699738534613269, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:02:18,535] [INFO] [timer.py:197:stop] 0/6642, RunningAvgSamplesPerSec=6.335251033360833, CurrSamplesPerSec=5.684267116288428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:02:29,824] [INFO] [timer.py:197:stop] 0/6644, RunningAvgSamplesPerSec=6.335255060875743, CurrSamplesPerSec=5.7108925524734895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:02:41,128] [INFO] [timer.py:197:stop] 0/6646, RunningAvgSamplesPerSec=6.335258861561374, CurrSamplesPerSec=5.713440523809875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:02:52,390] [INFO] [timer.py:197:stop] 0/6648, RunningAvgSamplesPerSec=6.335263542215526, CurrSamplesPerSec=5.708266494051981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:03:03,906] [INFO] [timer.py:197:stop] 0/6650, RunningAvgSamplesPerSec=6.335263763525754, CurrSamplesPerSec=5.702244582000596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0005, 'learning_rate': 3.7355555555555555e-06, 'epoch': 14.09} [2022-12-17 08:03:15,189] [INFO] [timer.py:197:stop] 0/6652, RunningAvgSamplesPerSec=6.335268083035561, CurrSamplesPerSec=5.705293600602794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:03:26,497] [INFO] [timer.py:197:stop] 0/6654, RunningAvgSamplesPerSec=6.335269735440721, CurrSamplesPerSec=5.711725176171957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:03:37,786] [INFO] [timer.py:197:stop] 0/6656, RunningAvgSamplesPerSec=6.335274164895733, CurrSamplesPerSec=5.725324466139221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:03:49,173] [INFO] [timer.py:197:stop] 0/6658, RunningAvgSamplesPerSec=6.335257735532661, CurrSamplesPerSec=5.597545279286038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:04:00,490] [INFO] [logging.py:68:log_dist] [Rank 0] step=3330, skipped=5, lr=[3.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 08:04:00,492] [INFO] [timer.py:197:stop] 0/6660, RunningAvgSamplesPerSec=6.33525684867156, CurrSamplesPerSec=5.705849508753836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:04:11,766] [INFO] [timer.py:197:stop] 0/6662, RunningAvgSamplesPerSec=6.335262246882107, CurrSamplesPerSec=5.711982594968425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:04:23,073] [INFO] [timer.py:197:stop] 0/6664, RunningAvgSamplesPerSec=6.335263028881804, CurrSamplesPerSec=5.69388579921607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:04:34,384] [INFO] [timer.py:197:stop] 0/6666, RunningAvgSamplesPerSec=6.335262645151982, CurrSamplesPerSec=5.6963992731641575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:04:45,654] [INFO] [timer.py:197:stop] 0/6668, RunningAvgSamplesPerSec=6.3352687746241605, CurrSamplesPerSec=5.7199967730182735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:04:56,965] [INFO] [timer.py:197:stop] 0/6670, RunningAvgSamplesPerSec=6.335270488377442, CurrSamplesPerSec=5.689561470325925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:05:08,276] [INFO] [timer.py:197:stop] 0/6672, RunningAvgSamplesPerSec=6.335270114949652, CurrSamplesPerSec=5.708504663321607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:05:19,597] [INFO] [timer.py:197:stop] 0/6674, RunningAvgSamplesPerSec=6.335268466370297, CurrSamplesPerSec=5.690876948369703, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:05:31,093] [INFO] [timer.py:197:stop] 0/6676, RunningAvgSamplesPerSec=6.3352654854506785, CurrSamplesPerSec=5.707486816894073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:05:42,393] [INFO] [timer.py:197:stop] 0/6678, RunningAvgSamplesPerSec=6.335264715430377, CurrSamplesPerSec=5.688169699341959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:05:53,698] [INFO] [logging.py:68:log_dist] [Rank 0] step=3340, skipped=5, lr=[3.7022222222222227e-06], mom=[[0.9, 0.999]] [2022-12-17 08:05:53,700] [INFO] [timer.py:197:stop] 0/6680, RunningAvgSamplesPerSec=6.335266006192747, CurrSamplesPerSec=5.705141302615219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:06:05,038] [INFO] [timer.py:197:stop] 0/6682, RunningAvgSamplesPerSec=6.335261982342033, CurrSamplesPerSec=5.68633843174893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:06:16,383] [INFO] [timer.py:197:stop] 0/6684, RunningAvgSamplesPerSec=6.335256750990903, CurrSamplesPerSec=5.697760969574792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:06:27,700] [INFO] [timer.py:197:stop] 0/6686, RunningAvgSamplesPerSec=6.3352566792083405, CurrSamplesPerSec=5.706313818730529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:06:39,031] [INFO] [timer.py:197:stop] 0/6688, RunningAvgSamplesPerSec=6.335252849571285, CurrSamplesPerSec=5.701574569442484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:06:50,306] [INFO] [timer.py:197:stop] 0/6690, RunningAvgSamplesPerSec=6.335259835192428, CurrSamplesPerSec=5.716095689794135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:07:01,675] [INFO] [timer.py:197:stop] 0/6692, RunningAvgSamplesPerSec=6.335252075091651, CurrSamplesPerSec=5.65181390420287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:07:12,987] [INFO] [timer.py:197:stop] 0/6694, RunningAvgSamplesPerSec=6.335255091903392, CurrSamplesPerSec=5.731070739015337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:07:24,290] [INFO] [timer.py:197:stop] 0/6696, RunningAvgSamplesPerSec=6.335256305080493, CurrSamplesPerSec=5.6998179271863325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:07:35,587] [INFO] [timer.py:197:stop] 0/6698, RunningAvgSamplesPerSec=6.335258540880882, CurrSamplesPerSec=5.696401449037628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:07:46,914] [INFO] [logging.py:68:log_dist] [Rank 0] step=3350, skipped=5, lr=[3.6800000000000003e-06], mom=[[0.9, 0.999]] [2022-12-17 08:07:46,916] [INFO] [timer.py:197:stop] 0/6700, RunningAvgSamplesPerSec=6.335257708318872, CurrSamplesPerSec=5.687760881501747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0005, 'learning_rate': 3.6800000000000003e-06, 'epoch': 14.19} [2022-12-17 08:07:58,231] [INFO] [timer.py:197:stop] 0/6702, RunningAvgSamplesPerSec=6.33525974504948, CurrSamplesPerSec=5.714890436985326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:08:09,564] [INFO] [timer.py:197:stop] 0/6704, RunningAvgSamplesPerSec=6.335257964839845, CurrSamplesPerSec=5.67029342892191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:08:20,858] [INFO] [timer.py:197:stop] 0/6706, RunningAvgSamplesPerSec=6.33526392682798, CurrSamplesPerSec=5.707362068898067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:08:32,139] [INFO] [timer.py:197:stop] 0/6708, RunningAvgSamplesPerSec=6.33527052873353, CurrSamplesPerSec=5.71907424934495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:08:43,389] [INFO] [timer.py:197:stop] 0/6710, RunningAvgSamplesPerSec=6.335281135421937, CurrSamplesPerSec=5.738554357497209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:08:54,673] [INFO] [timer.py:197:stop] 0/6712, RunningAvgSamplesPerSec=6.335286597623496, CurrSamplesPerSec=5.707688997892389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:09:05,934] [INFO] [timer.py:197:stop] 0/6714, RunningAvgSamplesPerSec=6.335290896389518, CurrSamplesPerSec=5.718155921013812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:09:17,219] [INFO] [timer.py:197:stop] 0/6716, RunningAvgSamplesPerSec=6.335293824979662, CurrSamplesPerSec=5.723692042585808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:09:28,508] [INFO] [timer.py:197:stop] 0/6718, RunningAvgSamplesPerSec=6.33529850860076, CurrSamplesPerSec=5.698207513618382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:09:39,774] [INFO] [logging.py:68:log_dist] [Rank 0] step=3360, skipped=5, lr=[3.657777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 08:09:39,776] [INFO] [timer.py:197:stop] 0/6720, RunningAvgSamplesPerSec=6.335304443900086, CurrSamplesPerSec=5.721201742101746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:09:50,989] [INFO] [timer.py:197:stop] 0/6722, RunningAvgSamplesPerSec=6.335321342992387, CurrSamplesPerSec=5.75129787762289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:10:02,250] [INFO] [timer.py:197:stop] 0/6724, RunningAvgSamplesPerSec=6.335331044349278, CurrSamplesPerSec=5.7417568011259545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:10:13,520] [INFO] [timer.py:197:stop] 0/6726, RunningAvgSamplesPerSec=6.335336258451266, CurrSamplesPerSec=5.7119300884010835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:10:24,824] [INFO] [timer.py:197:stop] 0/6728, RunningAvgSamplesPerSec=6.3353355264456335, CurrSamplesPerSec=5.691331825993934, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:10:36,091] [INFO] [timer.py:197:stop] 0/6730, RunningAvgSamplesPerSec=6.335344092952332, CurrSamplesPerSec=5.740056569296319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:10:47,363] [INFO] [timer.py:197:stop] 0/6732, RunningAvgSamplesPerSec=6.33535210505159, CurrSamplesPerSec=5.723860954990893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:10:58,619] [INFO] [timer.py:197:stop] 0/6734, RunningAvgSamplesPerSec=6.33535971186609, CurrSamplesPerSec=5.7344738159501825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:11:09,901] [INFO] [timer.py:197:stop] 0/6736, RunningAvgSamplesPerSec=6.335368220679748, CurrSamplesPerSec=5.742782485543321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:11:21,153] [INFO] [timer.py:197:stop] 0/6738, RunningAvgSamplesPerSec=6.335375761893769, CurrSamplesPerSec=5.738013891085249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:11:32,462] [INFO] [logging.py:68:log_dist] [Rank 0] step=3370, skipped=5, lr=[3.635555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 08:11:32,463] [INFO] [timer.py:197:stop] 0/6740, RunningAvgSamplesPerSec=6.335381584313619, CurrSamplesPerSec=5.710524924063667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:11:43,708] [INFO] [timer.py:197:stop] 0/6742, RunningAvgSamplesPerSec=6.335393113987534, CurrSamplesPerSec=5.749184147746339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:11:54,944] [INFO] [timer.py:197:stop] 0/6744, RunningAvgSamplesPerSec=6.335403399337236, CurrSamplesPerSec=5.731368818118538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:12:06,243] [INFO] [timer.py:197:stop] 0/6746, RunningAvgSamplesPerSec=6.335405800521349, CurrSamplesPerSec=5.707700648614486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:12:17,513] [INFO] [timer.py:197:stop] 0/6748, RunningAvgSamplesPerSec=6.33541325872857, CurrSamplesPerSec=5.699587743224427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:12:28,805] [INFO] [timer.py:197:stop] 0/6750, RunningAvgSamplesPerSec=6.335419206168785, CurrSamplesPerSec=5.72087448256261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0009, 'learning_rate': 3.624444444444445e-06, 'epoch': 14.3} [2022-12-17 08:12:40,100] [INFO] [timer.py:197:stop] 0/6752, RunningAvgSamplesPerSec=6.335425900995575, CurrSamplesPerSec=5.729924964973949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:12:51,350] [INFO] [timer.py:197:stop] 0/6754, RunningAvgSamplesPerSec=6.335434128041205, CurrSamplesPerSec=5.727344187421743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:13:02,597] [INFO] [timer.py:197:stop] 0/6756, RunningAvgSamplesPerSec=6.335443808382267, CurrSamplesPerSec=5.728703359756582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:13:13,907] [INFO] [timer.py:197:stop] 0/6758, RunningAvgSamplesPerSec=6.335447032977318, CurrSamplesPerSec=5.70918893392257, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:13:25,194] [INFO] [logging.py:68:log_dist] [Rank 0] step=3380, skipped=5, lr=[3.6133333333333336e-06], mom=[[0.9, 0.999]] [2022-12-17 08:13:25,196] [INFO] [timer.py:197:stop] 0/6760, RunningAvgSamplesPerSec=6.33545070601345, CurrSamplesPerSec=5.69207354103689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:13:36,570] [INFO] [timer.py:197:stop] 0/6762, RunningAvgSamplesPerSec=6.335457147352796, CurrSamplesPerSec=5.711385146683424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:13:47,814] [INFO] [timer.py:197:stop] 0/6764, RunningAvgSamplesPerSec=6.335465629789517, CurrSamplesPerSec=5.726456184764834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:13:59,082] [INFO] [timer.py:197:stop] 0/6766, RunningAvgSamplesPerSec=6.335473141102003, CurrSamplesPerSec=5.7204536359869085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:14:10,357] [INFO] [timer.py:197:stop] 0/6768, RunningAvgSamplesPerSec=6.3354813559791765, CurrSamplesPerSec=5.714565602211815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:14:21,632] [INFO] [timer.py:197:stop] 0/6770, RunningAvgSamplesPerSec=6.335488909631325, CurrSamplesPerSec=5.728411670041658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:14:32,882] [INFO] [timer.py:197:stop] 0/6772, RunningAvgSamplesPerSec=6.335497528237216, CurrSamplesPerSec=5.7178209714922055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:14:44,133] [INFO] [timer.py:197:stop] 0/6774, RunningAvgSamplesPerSec=6.335505795423574, CurrSamplesPerSec=5.716592103813912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:14:55,421] [INFO] [timer.py:197:stop] 0/6776, RunningAvgSamplesPerSec=6.3355121742745775, CurrSamplesPerSec=5.704353508621425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:15:06,682] [INFO] [timer.py:197:stop] 0/6778, RunningAvgSamplesPerSec=6.335520427485223, CurrSamplesPerSec=5.731632416359113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:15:17,930] [INFO] [logging.py:68:log_dist] [Rank 0] step=3390, skipped=5, lr=[3.5911111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 08:15:17,931] [INFO] [timer.py:197:stop] 0/6780, RunningAvgSamplesPerSec=6.335527528500903, CurrSamplesPerSec=5.716105183907788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:15:29,226] [INFO] [timer.py:197:stop] 0/6782, RunningAvgSamplesPerSec=6.335529292704897, CurrSamplesPerSec=5.703166045845869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:15:40,494] [INFO] [timer.py:197:stop] 0/6784, RunningAvgSamplesPerSec=6.335539846623111, CurrSamplesPerSec=5.7389356646810805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:15:51,770] [INFO] [timer.py:197:stop] 0/6786, RunningAvgSamplesPerSec=6.3355470731281125, CurrSamplesPerSec=5.725897964153275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:16:03,056] [INFO] [timer.py:197:stop] 0/6788, RunningAvgSamplesPerSec=6.335558698275204, CurrSamplesPerSec=5.750312755131069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:16:14,329] [INFO] [timer.py:197:stop] 0/6790, RunningAvgSamplesPerSec=6.335565162906134, CurrSamplesPerSec=5.72310068363258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:16:25,568] [INFO] [timer.py:197:stop] 0/6792, RunningAvgSamplesPerSec=6.3355781705712575, CurrSamplesPerSec=5.7392797188928295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:16:36,788] [INFO] [timer.py:197:stop] 0/6794, RunningAvgSamplesPerSec=6.3355899210501265, CurrSamplesPerSec=5.747177050776454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:16:48,027] [INFO] [timer.py:197:stop] 0/6796, RunningAvgSamplesPerSec=6.3356043056202145, CurrSamplesPerSec=5.745012248058093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:16:59,286] [INFO] [timer.py:197:stop] 0/6798, RunningAvgSamplesPerSec=6.335614858480439, CurrSamplesPerSec=5.725945353710456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:17:10,581] [INFO] [logging.py:68:log_dist] [Rank 0] step=3400, skipped=5, lr=[3.568888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 08:17:10,583] [INFO] [timer.py:197:stop] 0/6800, RunningAvgSamplesPerSec=6.335619836173475, CurrSamplesPerSec=5.717412507986601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0008, 'learning_rate': 3.568888888888889e-06, 'epoch': 14.41} [2022-12-17 08:17:21,869] [INFO] [timer.py:197:stop] 0/6802, RunningAvgSamplesPerSec=6.335624512362709, CurrSamplesPerSec=5.709963003305895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:17:33,131] [INFO] [timer.py:197:stop] 0/6804, RunningAvgSamplesPerSec=6.335632256641158, CurrSamplesPerSec=5.7104974692478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:17:44,449] [INFO] [timer.py:197:stop] 0/6806, RunningAvgSamplesPerSec=6.3356370688183095, CurrSamplesPerSec=5.702223989970849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:17:55,758] [INFO] [timer.py:197:stop] 0/6808, RunningAvgSamplesPerSec=6.335634554728273, CurrSamplesPerSec=5.681335995843267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:18:07,044] [INFO] [timer.py:197:stop] 0/6810, RunningAvgSamplesPerSec=6.335642356203787, CurrSamplesPerSec=5.74130463368171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:18:18,373] [INFO] [timer.py:197:stop] 0/6812, RunningAvgSamplesPerSec=6.335643910205863, CurrSamplesPerSec=5.700655798880245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:18:29,676] [INFO] [timer.py:197:stop] 0/6814, RunningAvgSamplesPerSec=6.335647904715997, CurrSamplesPerSec=5.704953851446727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:18:40,928] [INFO] [timer.py:197:stop] 0/6816, RunningAvgSamplesPerSec=6.335657061179278, CurrSamplesPerSec=5.731199217585403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:18:52,279] [INFO] [timer.py:197:stop] 0/6818, RunningAvgSamplesPerSec=6.335654143809877, CurrSamplesPerSec=5.673416040982523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:19:03,532] [INFO] [logging.py:68:log_dist] [Rank 0] step=3410, skipped=5, lr=[3.5466666666666673e-06], mom=[[0.9, 0.999]] [2022-12-17 08:19:03,533] [INFO] [timer.py:197:stop] 0/6820, RunningAvgSamplesPerSec=6.335662274107223, CurrSamplesPerSec=5.727299707429968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:19:14,845] [INFO] [timer.py:197:stop] 0/6822, RunningAvgSamplesPerSec=6.335671436628885, CurrSamplesPerSec=5.723870230816352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:19:26,054] [INFO] [timer.py:197:stop] 0/6824, RunningAvgSamplesPerSec=6.335688201402547, CurrSamplesPerSec=5.752427315123319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:19:37,330] [INFO] [timer.py:197:stop] 0/6826, RunningAvgSamplesPerSec=6.335696912892801, CurrSamplesPerSec=5.72277906300634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:19:48,614] [INFO] [timer.py:197:stop] 0/6828, RunningAvgSamplesPerSec=6.3357056730619945, CurrSamplesPerSec=5.72484216216728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:19:59,931] [INFO] [timer.py:197:stop] 0/6830, RunningAvgSamplesPerSec=6.335707479480197, CurrSamplesPerSec=5.703681546110575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:20:11,197] [INFO] [timer.py:197:stop] 0/6832, RunningAvgSamplesPerSec=6.335715269638013, CurrSamplesPerSec=5.717828766235083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:20:22,486] [INFO] [timer.py:197:stop] 0/6834, RunningAvgSamplesPerSec=6.335721207825923, CurrSamplesPerSec=5.724875615566773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:20:33,797] [INFO] [timer.py:197:stop] 0/6836, RunningAvgSamplesPerSec=6.3357270916090345, CurrSamplesPerSec=5.717510173438337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:20:45,062] [INFO] [timer.py:197:stop] 0/6838, RunningAvgSamplesPerSec=6.335738854291303, CurrSamplesPerSec=5.7381917455513936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:20:56,337] [INFO] [logging.py:68:log_dist] [Rank 0] step=3420, skipped=5, lr=[3.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 08:20:56,339] [INFO] [timer.py:197:stop] 0/6840, RunningAvgSamplesPerSec=6.335748624177438, CurrSamplesPerSec=5.712516466537226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:21:07,619] [INFO] [timer.py:197:stop] 0/6842, RunningAvgSamplesPerSec=6.3357580202583454, CurrSamplesPerSec=5.729844731471844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:21:18,896] [INFO] [timer.py:197:stop] 0/6844, RunningAvgSamplesPerSec=6.335764802157587, CurrSamplesPerSec=5.718550359160185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:21:30,178] [INFO] [timer.py:197:stop] 0/6846, RunningAvgSamplesPerSec=6.335770762004364, CurrSamplesPerSec=5.708233720020049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:21:41,441] [INFO] [timer.py:197:stop] 0/6848, RunningAvgSamplesPerSec=6.335778103959621, CurrSamplesPerSec=5.700215407405394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:21:52,731] [INFO] [timer.py:197:stop] 0/6850, RunningAvgSamplesPerSec=6.335784778141418, CurrSamplesPerSec=5.717134630278461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.5133333333333337e-06, 'epoch': 14.51} [2022-12-17 08:22:04,061] [INFO] [timer.py:197:stop] 0/6852, RunningAvgSamplesPerSec=6.335783966524976, CurrSamplesPerSec=5.688345923740799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:22:15,354] [INFO] [timer.py:197:stop] 0/6854, RunningAvgSamplesPerSec=6.335788130663457, CurrSamplesPerSec=5.720207398745991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:22:26,618] [INFO] [timer.py:197:stop] 0/6856, RunningAvgSamplesPerSec=6.335797860329845, CurrSamplesPerSec=5.732850129006261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:22:37,892] [INFO] [timer.py:197:stop] 0/6858, RunningAvgSamplesPerSec=6.335802709178029, CurrSamplesPerSec=5.71466876670745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:22:49,155] [INFO] [logging.py:68:log_dist] [Rank 0] step=3430, skipped=5, lr=[3.5022222222222225e-06], mom=[[0.9, 0.999]] [2022-12-17 08:22:49,156] [INFO] [timer.py:197:stop] 0/6860, RunningAvgSamplesPerSec=6.335808399079511, CurrSamplesPerSec=5.727941557107226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:23:00,440] [INFO] [timer.py:197:stop] 0/6862, RunningAvgSamplesPerSec=6.335816373501924, CurrSamplesPerSec=5.718228762504417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:23:11,737] [INFO] [timer.py:197:stop] 0/6864, RunningAvgSamplesPerSec=6.335820645429806, CurrSamplesPerSec=5.706740837128679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:23:22,999] [INFO] [timer.py:197:stop] 0/6866, RunningAvgSamplesPerSec=6.335828758770827, CurrSamplesPerSec=5.730471982504473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:23:34,230] [INFO] [timer.py:197:stop] 0/6868, RunningAvgSamplesPerSec=6.335839943703662, CurrSamplesPerSec=5.733005624138695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:23:45,488] [INFO] [timer.py:197:stop] 0/6870, RunningAvgSamplesPerSec=6.335847727715519, CurrSamplesPerSec=5.702430886064505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:23:56,774] [INFO] [timer.py:197:stop] 0/6872, RunningAvgSamplesPerSec=6.335855739519945, CurrSamplesPerSec=5.7183028239827856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:24:08,043] [INFO] [timer.py:197:stop] 0/6874, RunningAvgSamplesPerSec=6.335864068228962, CurrSamplesPerSec=5.727191198700244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:24:19,291] [INFO] [timer.py:197:stop] 0/6876, RunningAvgSamplesPerSec=6.33587683092059, CurrSamplesPerSec=5.749437811807974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:24:30,554] [INFO] [timer.py:197:stop] 0/6878, RunningAvgSamplesPerSec=6.3358862520744275, CurrSamplesPerSec=5.734842329116901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:24:41,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=3440, skipped=5, lr=[3.48e-06], mom=[[0.9, 0.999]] [2022-12-17 08:24:41,828] [INFO] [timer.py:197:stop] 0/6880, RunningAvgSamplesPerSec=6.335890426523647, CurrSamplesPerSec=5.705601616605988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:24:53,089] [INFO] [timer.py:197:stop] 0/6882, RunningAvgSamplesPerSec=6.335901324756337, CurrSamplesPerSec=5.740607978399688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:25:04,361] [INFO] [timer.py:197:stop] 0/6884, RunningAvgSamplesPerSec=6.335911221101385, CurrSamplesPerSec=5.7418000321618115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:25:15,641] [INFO] [timer.py:197:stop] 0/6886, RunningAvgSamplesPerSec=6.335917392893157, CurrSamplesPerSec=5.711138474432159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:25:27,040] [INFO] [timer.py:197:stop] 0/6888, RunningAvgSamplesPerSec=6.335900279695778, CurrSamplesPerSec=5.609052065020303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:25:38,349] [INFO] [timer.py:197:stop] 0/6890, RunningAvgSamplesPerSec=6.335900315427077, CurrSamplesPerSec=5.696869058630994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:25:49,666] [INFO] [timer.py:197:stop] 0/6892, RunningAvgSamplesPerSec=6.335898075337393, CurrSamplesPerSec=5.699606379930252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:26:00,949] [INFO] [timer.py:197:stop] 0/6894, RunningAvgSamplesPerSec=6.335902164159549, CurrSamplesPerSec=5.718328892130646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:26:12,240] [INFO] [timer.py:197:stop] 0/6896, RunningAvgSamplesPerSec=6.335904938835898, CurrSamplesPerSec=5.701893811029447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:26:23,574] [INFO] [timer.py:197:stop] 0/6898, RunningAvgSamplesPerSec=6.3359007400630984, CurrSamplesPerSec=5.69692273951494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:26:34,862] [INFO] [logging.py:68:log_dist] [Rank 0] step=3450, skipped=5, lr=[3.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 08:26:34,863] [INFO] [timer.py:197:stop] 0/6900, RunningAvgSamplesPerSec=6.335905375702142, CurrSamplesPerSec=5.716917899546148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.457777777777778e-06, 'epoch': 14.62} [2022-12-17 08:26:46,183] [INFO] [timer.py:197:stop] 0/6902, RunningAvgSamplesPerSec=6.335905797329182, CurrSamplesPerSec=5.692203415362899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:26:57,512] [INFO] [timer.py:197:stop] 0/6904, RunningAvgSamplesPerSec=6.335902977951562, CurrSamplesPerSec=5.677874616513825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:27:08,810] [INFO] [timer.py:197:stop] 0/6906, RunningAvgSamplesPerSec=6.335904392747657, CurrSamplesPerSec=5.695849314014534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:27:20,096] [INFO] [timer.py:197:stop] 0/6908, RunningAvgSamplesPerSec=6.335910373994993, CurrSamplesPerSec=5.736314158738321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:27:31,401] [INFO] [timer.py:197:stop] 0/6910, RunningAvgSamplesPerSec=6.335912398613864, CurrSamplesPerSec=5.71390071854944, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:27:42,692] [INFO] [timer.py:197:stop] 0/6912, RunningAvgSamplesPerSec=6.335915392107247, CurrSamplesPerSec=5.7122050293280004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:27:53,995] [INFO] [timer.py:197:stop] 0/6914, RunningAvgSamplesPerSec=6.335916733915622, CurrSamplesPerSec=5.725665668959716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:28:05,313] [INFO] [timer.py:197:stop] 0/6916, RunningAvgSamplesPerSec=6.335914598370398, CurrSamplesPerSec=5.679297640686226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:28:16,631] [INFO] [timer.py:197:stop] 0/6918, RunningAvgSamplesPerSec=6.335915782775784, CurrSamplesPerSec=5.700484621597277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:28:27,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=3460, skipped=5, lr=[3.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 08:28:27,976] [INFO] [timer.py:197:stop] 0/6920, RunningAvgSamplesPerSec=6.335913270570877, CurrSamplesPerSec=5.707983921431771, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:28:39,318] [INFO] [timer.py:197:stop] 0/6922, RunningAvgSamplesPerSec=6.33590842048816, CurrSamplesPerSec=5.68215954167032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:28:50,626] [INFO] [timer.py:197:stop] 0/6924, RunningAvgSamplesPerSec=6.335910052087327, CurrSamplesPerSec=5.704671364224143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:29:01,927] [INFO] [timer.py:197:stop] 0/6926, RunningAvgSamplesPerSec=6.335912457522142, CurrSamplesPerSec=5.695688576781109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:29:13,222] [INFO] [timer.py:197:stop] 0/6928, RunningAvgSamplesPerSec=6.3359141317974705, CurrSamplesPerSec=5.6967387295234175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:29:24,527] [INFO] [timer.py:197:stop] 0/6930, RunningAvgSamplesPerSec=6.335916105885563, CurrSamplesPerSec=5.687891523266612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:29:35,825] [INFO] [timer.py:197:stop] 0/6932, RunningAvgSamplesPerSec=6.33591708655648, CurrSamplesPerSec=5.690714320118148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:29:47,140] [INFO] [timer.py:197:stop] 0/6934, RunningAvgSamplesPerSec=6.335917314739031, CurrSamplesPerSec=5.708687248940789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:29:58,455] [INFO] [timer.py:197:stop] 0/6936, RunningAvgSamplesPerSec=6.335917005333467, CurrSamplesPerSec=5.681345615327618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:30:09,741] [INFO] [timer.py:197:stop] 0/6938, RunningAvgSamplesPerSec=6.335926384666303, CurrSamplesPerSec=5.7203936594189075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:30:21,016] [INFO] [logging.py:68:log_dist] [Rank 0] step=3470, skipped=5, lr=[3.4133333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 08:30:21,017] [INFO] [timer.py:197:stop] 0/6940, RunningAvgSamplesPerSec=6.335933667012311, CurrSamplesPerSec=5.712208919044687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:30:32,294] [INFO] [timer.py:197:stop] 0/6942, RunningAvgSamplesPerSec=6.335941308188668, CurrSamplesPerSec=5.7301470865111455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:30:43,621] [INFO] [timer.py:197:stop] 0/6944, RunningAvgSamplesPerSec=6.3359411875298095, CurrSamplesPerSec=5.714233019399983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:30:55,097] [INFO] [timer.py:197:stop] 0/6946, RunningAvgSamplesPerSec=6.335946099866804, CurrSamplesPerSec=5.717290004668645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:31:06,462] [INFO] [timer.py:197:stop] 0/6948, RunningAvgSamplesPerSec=6.3359497882734175, CurrSamplesPerSec=5.712707089669525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:31:17,758] [INFO] [timer.py:197:stop] 0/6950, RunningAvgSamplesPerSec=6.335949974405662, CurrSamplesPerSec=5.708783402445716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.4022222222222222e-06, 'epoch': 14.72} [2022-12-17 08:31:29,055] [INFO] [timer.py:197:stop] 0/6952, RunningAvgSamplesPerSec=6.335951038316049, CurrSamplesPerSec=5.677335192591269, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:31:40,326] [INFO] [timer.py:197:stop] 0/6954, RunningAvgSamplesPerSec=6.335954671614913, CurrSamplesPerSec=5.711804659953133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:31:51,641] [INFO] [timer.py:197:stop] 0/6956, RunningAvgSamplesPerSec=6.335953433679765, CurrSamplesPerSec=5.7017620408218574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:32:02,923] [INFO] [timer.py:197:stop] 0/6958, RunningAvgSamplesPerSec=6.335957884016047, CurrSamplesPerSec=5.715676519597433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:32:14,226] [INFO] [logging.py:68:log_dist] [Rank 0] step=3480, skipped=5, lr=[3.391111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 08:32:14,227] [INFO] [timer.py:197:stop] 0/6960, RunningAvgSamplesPerSec=6.3359596235734665, CurrSamplesPerSec=5.70464784509039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:32:25,477] [INFO] [timer.py:197:stop] 0/6962, RunningAvgSamplesPerSec=6.335971526534016, CurrSamplesPerSec=5.738433890664609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:32:36,769] [INFO] [timer.py:197:stop] 0/6964, RunningAvgSamplesPerSec=6.335978612982931, CurrSamplesPerSec=5.727787558729274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:32:48,040] [INFO] [timer.py:197:stop] 0/6966, RunningAvgSamplesPerSec=6.335986047607533, CurrSamplesPerSec=5.71634717273252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:32:59,310] [INFO] [timer.py:197:stop] 0/6968, RunningAvgSamplesPerSec=6.335992538080176, CurrSamplesPerSec=5.719552169637749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:33:10,601] [INFO] [timer.py:197:stop] 0/6970, RunningAvgSamplesPerSec=6.335995701230729, CurrSamplesPerSec=5.7108582903242375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:33:21,893] [INFO] [timer.py:197:stop] 0/6972, RunningAvgSamplesPerSec=6.335996009422708, CurrSamplesPerSec=5.704970340835455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:33:33,190] [INFO] [timer.py:197:stop] 0/6974, RunningAvgSamplesPerSec=6.335998452897874, CurrSamplesPerSec=5.711568159668565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:33:44,470] [INFO] [timer.py:197:stop] 0/6976, RunningAvgSamplesPerSec=6.336001605932553, CurrSamplesPerSec=5.7131795689290055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:33:55,763] [INFO] [timer.py:197:stop] 0/6978, RunningAvgSamplesPerSec=6.336006264641368, CurrSamplesPerSec=5.7205270236611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:34:07,028] [INFO] [logging.py:68:log_dist] [Rank 0] step=3490, skipped=5, lr=[3.3688888888888895e-06], mom=[[0.9, 0.999]] [2022-12-17 08:34:07,030] [INFO] [timer.py:197:stop] 0/6980, RunningAvgSamplesPerSec=6.3360142860991475, CurrSamplesPerSec=5.741111360582159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:34:18,336] [INFO] [timer.py:197:stop] 0/6982, RunningAvgSamplesPerSec=6.336015650217596, CurrSamplesPerSec=5.715640252824482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:34:29,634] [INFO] [timer.py:197:stop] 0/6984, RunningAvgSamplesPerSec=6.336015398383265, CurrSamplesPerSec=5.694377154970248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:34:40,938] [INFO] [timer.py:197:stop] 0/6986, RunningAvgSamplesPerSec=6.336013926721686, CurrSamplesPerSec=5.701118779608611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:34:52,254] [INFO] [timer.py:197:stop] 0/6988, RunningAvgSamplesPerSec=6.336016999623785, CurrSamplesPerSec=5.708849691975506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:35:03,578] [INFO] [timer.py:197:stop] 0/6990, RunningAvgSamplesPerSec=6.336015230397008, CurrSamplesPerSec=5.689597889276906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:35:14,843] [INFO] [timer.py:197:stop] 0/6992, RunningAvgSamplesPerSec=6.336022713412811, CurrSamplesPerSec=5.721351728246623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:35:26,128] [INFO] [timer.py:197:stop] 0/6994, RunningAvgSamplesPerSec=6.3360238162424, CurrSamplesPerSec=5.69377396361336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:35:37,338] [INFO] [timer.py:197:stop] 0/6996, RunningAvgSamplesPerSec=6.336030134866878, CurrSamplesPerSec=5.723018444873249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:35:48,651] [INFO] [timer.py:197:stop] 0/6998, RunningAvgSamplesPerSec=6.336030827930519, CurrSamplesPerSec=5.707811332846688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:35:59,959] [INFO] [logging.py:68:log_dist] [Rank 0] step=3500, skipped=5, lr=[3.346666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 08:35:59,961] [INFO] [timer.py:197:stop] 0/7000, RunningAvgSamplesPerSec=6.3360310804451325, CurrSamplesPerSec=5.698258800513624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 3.346666666666667e-06, 'epoch': 14.83} [2022-12-17 08:36:11,249] [INFO] [timer.py:197:stop] 0/7002, RunningAvgSamplesPerSec=6.336033006654374, CurrSamplesPerSec=5.708592798272643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:36:22,512] [INFO] [timer.py:197:stop] 0/7004, RunningAvgSamplesPerSec=6.336039423997487, CurrSamplesPerSec=5.725660295371774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:36:33,824] [INFO] [timer.py:197:stop] 0/7006, RunningAvgSamplesPerSec=6.336040422932118, CurrSamplesPerSec=5.715054693343505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:36:45,159] [INFO] [timer.py:197:stop] 0/7008, RunningAvgSamplesPerSec=6.3360402778094, CurrSamplesPerSec=5.694589763880119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:36:56,492] [INFO] [timer.py:197:stop] 0/7010, RunningAvgSamplesPerSec=6.336039235706016, CurrSamplesPerSec=5.71252570563038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:37:07,829] [INFO] [timer.py:197:stop] 0/7012, RunningAvgSamplesPerSec=6.3360369664535945, CurrSamplesPerSec=5.674658803355623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:37:19,106] [INFO] [timer.py:197:stop] 0/7014, RunningAvgSamplesPerSec=6.336042286637709, CurrSamplesPerSec=5.713445631272786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:37:30,452] [INFO] [timer.py:197:stop] 0/7016, RunningAvgSamplesPerSec=6.336035252063371, CurrSamplesPerSec=5.665982842191559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:37:41,744] [INFO] [timer.py:197:stop] 0/7018, RunningAvgSamplesPerSec=6.336039430102445, CurrSamplesPerSec=5.712375695630346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:37:53,057] [INFO] [logging.py:68:log_dist] [Rank 0] step=3510, skipped=5, lr=[3.3244444444444447e-06], mom=[[0.9, 0.999]] [2022-12-17 08:37:53,058] [INFO] [timer.py:197:stop] 0/7020, RunningAvgSamplesPerSec=6.336039328173379, CurrSamplesPerSec=5.693599817998553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:38:04,340] [INFO] [timer.py:197:stop] 0/7022, RunningAvgSamplesPerSec=6.336045469751685, CurrSamplesPerSec=5.728534650432044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:38:15,651] [INFO] [timer.py:197:stop] 0/7024, RunningAvgSamplesPerSec=6.336046339089011, CurrSamplesPerSec=5.6930873458324065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:38:27,001] [INFO] [timer.py:197:stop] 0/7026, RunningAvgSamplesPerSec=6.336042450409582, CurrSamplesPerSec=5.693363132300313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:38:38,326] [INFO] [timer.py:197:stop] 0/7028, RunningAvgSamplesPerSec=6.3360397271690445, CurrSamplesPerSec=5.682382786805048, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:38:49,613] [INFO] [timer.py:197:stop] 0/7030, RunningAvgSamplesPerSec=6.3360431729577815, CurrSamplesPerSec=5.709435194920618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:39:00,942] [INFO] [timer.py:197:stop] 0/7032, RunningAvgSamplesPerSec=6.336039455924966, CurrSamplesPerSec=5.684558179697593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:39:12,245] [INFO] [timer.py:197:stop] 0/7034, RunningAvgSamplesPerSec=6.336040089069696, CurrSamplesPerSec=5.69708789920933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:39:23,556] [INFO] [timer.py:197:stop] 0/7036, RunningAvgSamplesPerSec=6.336039921529636, CurrSamplesPerSec=5.709073339439505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:39:34,893] [INFO] [timer.py:197:stop] 0/7038, RunningAvgSamplesPerSec=6.336036874526808, CurrSamplesPerSec=5.694211186094316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:39:46,334] [INFO] [logging.py:68:log_dist] [Rank 0] step=3520, skipped=5, lr=[3.3022222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 08:39:46,336] [INFO] [timer.py:197:stop] 0/7040, RunningAvgSamplesPerSec=6.336048338439192, CurrSamplesPerSec=5.736273216710899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:39:57,602] [INFO] [timer.py:197:stop] 0/7042, RunningAvgSamplesPerSec=6.336055473638131, CurrSamplesPerSec=5.747396574121967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:40:08,902] [INFO] [timer.py:197:stop] 0/7044, RunningAvgSamplesPerSec=6.33605758570761, CurrSamplesPerSec=5.697080402734628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:40:20,185] [INFO] [timer.py:197:stop] 0/7046, RunningAvgSamplesPerSec=6.336061026686192, CurrSamplesPerSec=5.710510832178299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:40:31,484] [INFO] [timer.py:197:stop] 0/7048, RunningAvgSamplesPerSec=6.3360633201784, CurrSamplesPerSec=5.714192635242332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:40:42,778] [INFO] [timer.py:197:stop] 0/7050, RunningAvgSamplesPerSec=6.336067257937258, CurrSamplesPerSec=5.706152732937158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 3.2911111111111116e-06, 'epoch': 14.94} [2022-12-17 08:40:54,074] [INFO] [timer.py:197:stop] 0/7052, RunningAvgSamplesPerSec=6.336070154814537, CurrSamplesPerSec=5.722531893571896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:41:05,367] [INFO] [timer.py:197:stop] 0/7054, RunningAvgSamplesPerSec=6.336073540324857, CurrSamplesPerSec=5.701336978082232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:41:16,666] [INFO] [timer.py:197:stop] 0/7056, RunningAvgSamplesPerSec=6.336075504694328, CurrSamplesPerSec=5.718039719534393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:41:27,914] [INFO] [timer.py:197:stop] 0/7058, RunningAvgSamplesPerSec=6.336083962326727, CurrSamplesPerSec=5.726415871971864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:41:39,192] [INFO] [logging.py:68:log_dist] [Rank 0] step=3530, skipped=5, lr=[3.2800000000000004e-06], mom=[[0.9, 0.999]] [2022-12-17 08:41:39,194] [INFO] [timer.py:197:stop] 0/7060, RunningAvgSamplesPerSec=6.3360858012627155, CurrSamplesPerSec=5.694563428505969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:41:50,461] [INFO] [timer.py:197:stop] 0/7062, RunningAvgSamplesPerSec=6.3360878851830895, CurrSamplesPerSec=5.695557334911018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:42:01,756] [INFO] [timer.py:197:stop] 0/7064, RunningAvgSamplesPerSec=6.336091603295266, CurrSamplesPerSec=5.708733382793981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:42:13,057] [INFO] [timer.py:197:stop] 0/7066, RunningAvgSamplesPerSec=6.336093431862512, CurrSamplesPerSec=5.712168563491688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:42:24,335] [INFO] [timer.py:197:stop] 0/7068, RunningAvgSamplesPerSec=6.3360968188326225, CurrSamplesPerSec=5.7172568834007444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:42:35,613] [INFO] [timer.py:197:stop] 0/7070, RunningAvgSamplesPerSec=6.3361020476538545, CurrSamplesPerSec=5.709504171237318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:42:46,948] [INFO] [timer.py:197:stop] 0/7072, RunningAvgSamplesPerSec=6.3360998344648705, CurrSamplesPerSec=5.685475140006454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:42:58,252] [INFO] [timer.py:197:stop] 0/7074, RunningAvgSamplesPerSec=6.3360977220499555, CurrSamplesPerSec=5.704194472889636, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:43:09,577] [INFO] [timer.py:197:stop] 0/7076, RunningAvgSamplesPerSec=6.336097357975871, CurrSamplesPerSec=5.691573169905834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:43:20,894] [INFO] [timer.py:197:stop] 0/7078, RunningAvgSamplesPerSec=6.336095189587284, CurrSamplesPerSec=5.695575461881137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:43:29,363] [INFO] [logging.py:68:log_dist] [Rank 0] step=3540, skipped=5, lr=[3.257777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 08:43:29,365] [INFO] [timer.py:197:stop] 0/7080, RunningAvgSamplesPerSec=6.336537019491826, CurrSamplesPerSec=10.248503148784106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:43:40,680] [INFO] [timer.py:197:stop] 0/7082, RunningAvgSamplesPerSec=6.336535244705349, CurrSamplesPerSec=5.7099207361778825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:43:51,992] [INFO] [timer.py:197:stop] 0/7084, RunningAvgSamplesPerSec=6.33653543670802, CurrSamplesPerSec=5.7024248291739665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:44:03,287] [INFO] [timer.py:197:stop] 0/7086, RunningAvgSamplesPerSec=6.336538873598614, CurrSamplesPerSec=5.714536892000577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:44:14,627] [INFO] [timer.py:197:stop] 0/7088, RunningAvgSamplesPerSec=6.336534355904899, CurrSamplesPerSec=5.702760884907398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:44:25,917] [INFO] [timer.py:197:stop] 0/7090, RunningAvgSamplesPerSec=6.336532799418478, CurrSamplesPerSec=5.703959329597068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:44:37,187] [INFO] [timer.py:197:stop] 0/7092, RunningAvgSamplesPerSec=6.336539300908651, CurrSamplesPerSec=5.730101339808488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:44:48,486] [INFO] [timer.py:197:stop] 0/7094, RunningAvgSamplesPerSec=6.336541123049372, CurrSamplesPerSec=5.716769850245525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:44:59,768] [INFO] [timer.py:197:stop] 0/7096, RunningAvgSamplesPerSec=6.336539637361981, CurrSamplesPerSec=5.7223798937525965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:45:11,052] [INFO] [timer.py:197:stop] 0/7098, RunningAvgSamplesPerSec=6.33654313167868, CurrSamplesPerSec=5.709941140920177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:45:22,316] [INFO] [logging.py:68:log_dist] [Rank 0] step=3550, skipped=5, lr=[3.2355555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 08:45:22,318] [INFO] [timer.py:197:stop] 0/7100, RunningAvgSamplesPerSec=6.336550468912335, CurrSamplesPerSec=5.725339119683709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.2355555555555556e-06, 'epoch': 15.04} [2022-12-17 08:45:33,597] [INFO] [timer.py:197:stop] 0/7102, RunningAvgSamplesPerSec=6.336555961465411, CurrSamplesPerSec=5.706621702272997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:45:44,865] [INFO] [timer.py:197:stop] 0/7104, RunningAvgSamplesPerSec=6.336565479135939, CurrSamplesPerSec=5.7179184073056994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:45:56,203] [INFO] [timer.py:197:stop] 0/7106, RunningAvgSamplesPerSec=6.33656338842234, CurrSamplesPerSec=5.695559026756681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:46:07,461] [INFO] [timer.py:197:stop] 0/7108, RunningAvgSamplesPerSec=6.33657199020889, CurrSamplesPerSec=5.729104145368554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:46:18,727] [INFO] [timer.py:197:stop] 0/7110, RunningAvgSamplesPerSec=6.3365792546897355, CurrSamplesPerSec=5.70705580414679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:46:29,992] [INFO] [timer.py:197:stop] 0/7112, RunningAvgSamplesPerSec=6.336586540261577, CurrSamplesPerSec=5.726482327303658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:46:41,286] [INFO] [timer.py:197:stop] 0/7114, RunningAvgSamplesPerSec=6.336590431273355, CurrSamplesPerSec=5.718244354235172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:46:52,577] [INFO] [timer.py:197:stop] 0/7116, RunningAvgSamplesPerSec=6.336594859000273, CurrSamplesPerSec=5.7176119828726035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:47:03,929] [INFO] [timer.py:197:stop] 0/7118, RunningAvgSamplesPerSec=6.336587273287749, CurrSamplesPerSec=5.647525654346897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:47:15,219] [INFO] [logging.py:68:log_dist] [Rank 0] step=3560, skipped=5, lr=[3.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 08:47:15,220] [INFO] [timer.py:197:stop] 0/7120, RunningAvgSamplesPerSec=6.336591838649718, CurrSamplesPerSec=5.72518770334102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:47:26,510] [INFO] [timer.py:197:stop] 0/7122, RunningAvgSamplesPerSec=6.336596657755495, CurrSamplesPerSec=5.721397091458818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:47:37,763] [INFO] [timer.py:197:stop] 0/7124, RunningAvgSamplesPerSec=6.336604749335138, CurrSamplesPerSec=5.7333793366997465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:47:49,080] [INFO] [timer.py:197:stop] 0/7126, RunningAvgSamplesPerSec=6.336609154333121, CurrSamplesPerSec=5.721207351198136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:48:00,371] [INFO] [timer.py:197:stop] 0/7128, RunningAvgSamplesPerSec=6.336612749558974, CurrSamplesPerSec=5.717910855911447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:48:11,629] [INFO] [timer.py:197:stop] 0/7130, RunningAvgSamplesPerSec=6.336622000394005, CurrSamplesPerSec=5.724012789389099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:48:22,953] [INFO] [timer.py:197:stop] 0/7132, RunningAvgSamplesPerSec=6.336620699917402, CurrSamplesPerSec=5.681224652682598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:48:34,213] [INFO] [timer.py:197:stop] 0/7134, RunningAvgSamplesPerSec=6.336627883073419, CurrSamplesPerSec=5.721560258636635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:48:45,473] [INFO] [timer.py:197:stop] 0/7136, RunningAvgSamplesPerSec=6.336637623973165, CurrSamplesPerSec=5.725201135112214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:48:56,737] [INFO] [timer.py:197:stop] 0/7138, RunningAvgSamplesPerSec=6.3366465793779305, CurrSamplesPerSec=5.730776116579845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:49:07,989] [INFO] [logging.py:68:log_dist] [Rank 0] step=3570, skipped=5, lr=[3.1911111111111117e-06], mom=[[0.9, 0.999]] [2022-12-17 08:49:07,989] [INFO] [timer.py:197:stop] 0/7140, RunningAvgSamplesPerSec=6.3366573634205965, CurrSamplesPerSec=5.756063900681472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:49:19,275] [INFO] [timer.py:197:stop] 0/7142, RunningAvgSamplesPerSec=6.336667028427543, CurrSamplesPerSec=5.72736545007946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:49:30,524] [INFO] [timer.py:197:stop] 0/7144, RunningAvgSamplesPerSec=6.336676511407681, CurrSamplesPerSec=5.725585554694367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:49:41,775] [INFO] [timer.py:197:stop] 0/7146, RunningAvgSamplesPerSec=6.336686159963732, CurrSamplesPerSec=5.7252008908976295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:49:53,178] [INFO] [timer.py:197:stop] 0/7148, RunningAvgSamplesPerSec=6.336691032030077, CurrSamplesPerSec=5.711277483169063, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:50:04,418] [INFO] [timer.py:197:stop] 0/7150, RunningAvgSamplesPerSec=6.3367025598086615, CurrSamplesPerSec=5.74369571437387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.1800000000000005e-06, 'epoch': 15.15} [2022-12-17 08:50:15,683] [INFO] [timer.py:197:stop] 0/7152, RunningAvgSamplesPerSec=6.336711577607592, CurrSamplesPerSec=5.735962370560981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:50:26,909] [INFO] [timer.py:197:stop] 0/7154, RunningAvgSamplesPerSec=6.336721119542908, CurrSamplesPerSec=5.736889370195829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:50:38,183] [INFO] [timer.py:197:stop] 0/7156, RunningAvgSamplesPerSec=6.336727910475315, CurrSamplesPerSec=5.719653320583004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:50:49,437] [INFO] [timer.py:197:stop] 0/7158, RunningAvgSamplesPerSec=6.336736546883692, CurrSamplesPerSec=5.747683062543447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:51:00,702] [INFO] [logging.py:68:log_dist] [Rank 0] step=3580, skipped=5, lr=[3.1688888888888893e-06], mom=[[0.9, 0.999]] [2022-12-17 08:51:00,703] [INFO] [timer.py:197:stop] 0/7160, RunningAvgSamplesPerSec=6.3367400491662975, CurrSamplesPerSec=5.724335525268068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:51:11,984] [INFO] [timer.py:197:stop] 0/7162, RunningAvgSamplesPerSec=6.336746149962355, CurrSamplesPerSec=5.696210945525485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:51:23,262] [INFO] [timer.py:197:stop] 0/7164, RunningAvgSamplesPerSec=6.336751783981821, CurrSamplesPerSec=5.703505097354939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:51:34,557] [INFO] [timer.py:197:stop] 0/7166, RunningAvgSamplesPerSec=6.336751668168491, CurrSamplesPerSec=5.679854743571504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:51:45,838] [INFO] [timer.py:197:stop] 0/7168, RunningAvgSamplesPerSec=6.3367593201336785, CurrSamplesPerSec=5.742135831748037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:51:57,139] [INFO] [timer.py:197:stop] 0/7170, RunningAvgSamplesPerSec=6.336761339660549, CurrSamplesPerSec=5.705994082024692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:52:08,614] [INFO] [timer.py:197:stop] 0/7172, RunningAvgSamplesPerSec=6.336769390384303, CurrSamplesPerSec=5.7323410945044655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:52:19,853] [INFO] [timer.py:197:stop] 0/7174, RunningAvgSamplesPerSec=6.336781230655281, CurrSamplesPerSec=5.7495683466982435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:52:31,087] [INFO] [timer.py:197:stop] 0/7176, RunningAvgSamplesPerSec=6.336794208617129, CurrSamplesPerSec=5.748378728034682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:52:42,362] [INFO] [timer.py:197:stop] 0/7178, RunningAvgSamplesPerSec=6.336800609932669, CurrSamplesPerSec=5.710724647572078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:52:53,636] [INFO] [logging.py:68:log_dist] [Rank 0] step=3590, skipped=5, lr=[3.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 08:52:53,638] [INFO] [timer.py:197:stop] 0/7180, RunningAvgSamplesPerSec=6.3368073192102194, CurrSamplesPerSec=5.725770211677484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:53:04,925] [INFO] [timer.py:197:stop] 0/7182, RunningAvgSamplesPerSec=6.336812826879142, CurrSamplesPerSec=5.709809970121474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:53:16,221] [INFO] [timer.py:197:stop] 0/7184, RunningAvgSamplesPerSec=6.336816386862075, CurrSamplesPerSec=5.710592954833432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:53:27,496] [INFO] [timer.py:197:stop] 0/7186, RunningAvgSamplesPerSec=6.336823078431179, CurrSamplesPerSec=5.733743545157348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:53:38,771] [INFO] [timer.py:197:stop] 0/7188, RunningAvgSamplesPerSec=6.336827209844939, CurrSamplesPerSec=5.720442420760707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:53:50,057] [INFO] [timer.py:197:stop] 0/7190, RunningAvgSamplesPerSec=6.336832873580546, CurrSamplesPerSec=5.706346570719739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:54:01,335] [INFO] [timer.py:197:stop] 0/7192, RunningAvgSamplesPerSec=6.336842179287273, CurrSamplesPerSec=5.726127835258792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:54:12,593] [INFO] [timer.py:197:stop] 0/7194, RunningAvgSamplesPerSec=6.336850813089948, CurrSamplesPerSec=5.735628518137128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:54:23,825] [INFO] [timer.py:197:stop] 0/7196, RunningAvgSamplesPerSec=6.336863840754564, CurrSamplesPerSec=5.7346649272389785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:54:35,047] [INFO] [timer.py:197:stop] 0/7198, RunningAvgSamplesPerSec=6.336880314132211, CurrSamplesPerSec=5.759368218107903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:54:46,281] [INFO] [logging.py:68:log_dist] [Rank 0] step=3600, skipped=5, lr=[3.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 08:54:46,282] [INFO] [timer.py:197:stop] 0/7200, RunningAvgSamplesPerSec=6.336892118427297, CurrSamplesPerSec=5.752183248066266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 3.124444444444445e-06, 'epoch': 15.25} [2022-12-17 08:54:57,558] [INFO] [timer.py:197:stop] 0/7202, RunningAvgSamplesPerSec=6.336900840680547, CurrSamplesPerSec=5.720915448919804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:55:08,863] [INFO] [timer.py:197:stop] 0/7204, RunningAvgSamplesPerSec=6.336903865225444, CurrSamplesPerSec=5.696353096685793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:55:20,130] [INFO] [timer.py:197:stop] 0/7206, RunningAvgSamplesPerSec=6.336912767375875, CurrSamplesPerSec=5.728348103719084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:55:31,375] [INFO] [timer.py:197:stop] 0/7208, RunningAvgSamplesPerSec=6.336919776426203, CurrSamplesPerSec=5.717917920118372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:55:42,843] [INFO] [timer.py:197:stop] 0/7210, RunningAvgSamplesPerSec=6.336926136940396, CurrSamplesPerSec=5.734597791822031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:55:54,116] [INFO] [timer.py:197:stop] 0/7212, RunningAvgSamplesPerSec=6.336932166862068, CurrSamplesPerSec=5.730988026336347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:56:05,383] [INFO] [timer.py:197:stop] 0/7214, RunningAvgSamplesPerSec=6.336941030459985, CurrSamplesPerSec=5.731765081284228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:56:16,674] [INFO] [timer.py:197:stop] 0/7216, RunningAvgSamplesPerSec=6.336945032001339, CurrSamplesPerSec=5.706284706166854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:56:27,964] [INFO] [timer.py:197:stop] 0/7218, RunningAvgSamplesPerSec=6.336949196971727, CurrSamplesPerSec=5.70364640091355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:56:39,222] [INFO] [logging.py:68:log_dist] [Rank 0] step=3610, skipped=5, lr=[3.1022222222222225e-06], mom=[[0.9, 0.999]] [2022-12-17 08:56:39,224] [INFO] [timer.py:197:stop] 0/7220, RunningAvgSamplesPerSec=6.336958361609847, CurrSamplesPerSec=5.727809557990573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:56:50,500] [INFO] [timer.py:197:stop] 0/7222, RunningAvgSamplesPerSec=6.336965881887145, CurrSamplesPerSec=5.725152781030868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:57:01,803] [INFO] [timer.py:197:stop] 0/7224, RunningAvgSamplesPerSec=6.336965178997902, CurrSamplesPerSec=5.697206152662326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:57:13,080] [INFO] [timer.py:197:stop] 0/7226, RunningAvgSamplesPerSec=6.33696923817337, CurrSamplesPerSec=5.711739030983794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:57:24,359] [INFO] [timer.py:197:stop] 0/7228, RunningAvgSamplesPerSec=6.33697472615728, CurrSamplesPerSec=5.707027654678077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:57:35,648] [INFO] [timer.py:197:stop] 0/7230, RunningAvgSamplesPerSec=6.336980410963525, CurrSamplesPerSec=5.710876271826484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:57:46,953] [INFO] [timer.py:197:stop] 0/7232, RunningAvgSamplesPerSec=6.33698384390746, CurrSamplesPerSec=5.7000343315881015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:57:58,269] [INFO] [timer.py:197:stop] 0/7234, RunningAvgSamplesPerSec=6.3369858943607245, CurrSamplesPerSec=5.696722287684846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:58:09,567] [INFO] [timer.py:197:stop] 0/7236, RunningAvgSamplesPerSec=6.336990611755504, CurrSamplesPerSec=5.717968344447178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:58:20,912] [INFO] [timer.py:197:stop] 0/7238, RunningAvgSamplesPerSec=6.336989861540987, CurrSamplesPerSec=5.703959572002851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:58:32,520] [INFO] [logging.py:68:log_dist] [Rank 0] step=3620, skipped=5, lr=[3.08e-06], mom=[[0.9, 0.999]] [2022-12-17 08:58:32,522] [INFO] [timer.py:197:stop] 0/7240, RunningAvgSamplesPerSec=6.336985721996713, CurrSamplesPerSec=5.6804374391771955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:58:43,801] [INFO] [timer.py:197:stop] 0/7242, RunningAvgSamplesPerSec=6.336987946622146, CurrSamplesPerSec=5.715724470332622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:58:55,080] [INFO] [timer.py:197:stop] 0/7244, RunningAvgSamplesPerSec=6.336995401497639, CurrSamplesPerSec=5.731321094009947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:59:06,399] [INFO] [timer.py:197:stop] 0/7246, RunningAvgSamplesPerSec=6.336996297692054, CurrSamplesPerSec=5.692427209007267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:59:17,689] [INFO] [timer.py:197:stop] 0/7248, RunningAvgSamplesPerSec=6.3370017557976315, CurrSamplesPerSec=5.719900241410128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:59:29,023] [INFO] [timer.py:197:stop] 0/7250, RunningAvgSamplesPerSec=6.336998937313612, CurrSamplesPerSec=5.69223359147299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.068888888888889e-06, 'epoch': 15.36} [2022-12-17 08:59:40,346] [INFO] [timer.py:197:stop] 0/7252, RunningAvgSamplesPerSec=6.336997740259851, CurrSamplesPerSec=5.705874493208206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 08:59:51,644] [INFO] [timer.py:197:stop] 0/7254, RunningAvgSamplesPerSec=6.337000250700455, CurrSamplesPerSec=5.706744719407257, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:00:02,933] [INFO] [timer.py:197:stop] 0/7256, RunningAvgSamplesPerSec=6.337003726190621, CurrSamplesPerSec=5.710104869965497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:00:14,204] [INFO] [timer.py:197:stop] 0/7258, RunningAvgSamplesPerSec=6.33701078437247, CurrSamplesPerSec=5.728722431870606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:00:25,504] [INFO] [logging.py:68:log_dist] [Rank 0] step=3630, skipped=5, lr=[3.0577777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 09:00:25,506] [INFO] [timer.py:197:stop] 0/7260, RunningAvgSamplesPerSec=6.337015045017911, CurrSamplesPerSec=5.700771779147463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:00:36,775] [INFO] [timer.py:197:stop] 0/7262, RunningAvgSamplesPerSec=6.337021794010251, CurrSamplesPerSec=5.731328925604024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:00:48,050] [INFO] [timer.py:197:stop] 0/7264, RunningAvgSamplesPerSec=6.337026725477906, CurrSamplesPerSec=5.727018179771592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:00:59,358] [INFO] [timer.py:197:stop] 0/7266, RunningAvgSamplesPerSec=6.337026294876397, CurrSamplesPerSec=5.686934747484086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:01:10,668] [INFO] [timer.py:197:stop] 0/7268, RunningAvgSamplesPerSec=6.3370274181598685, CurrSamplesPerSec=5.7002589835173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:01:21,952] [INFO] [timer.py:197:stop] 0/7270, RunningAvgSamplesPerSec=6.3370293113435805, CurrSamplesPerSec=5.722975740218383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:01:33,275] [INFO] [timer.py:197:stop] 0/7272, RunningAvgSamplesPerSec=6.337027527195671, CurrSamplesPerSec=5.686863664902051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:01:44,567] [INFO] [timer.py:197:stop] 0/7274, RunningAvgSamplesPerSec=6.337031035737247, CurrSamplesPerSec=5.712275774378226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:01:55,858] [INFO] [timer.py:197:stop] 0/7276, RunningAvgSamplesPerSec=6.337034122181147, CurrSamplesPerSec=5.715932347290631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:02:07,148] [INFO] [timer.py:197:stop] 0/7278, RunningAvgSamplesPerSec=6.337035184613046, CurrSamplesPerSec=5.7143690160321805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:02:18,405] [INFO] [logging.py:68:log_dist] [Rank 0] step=3640, skipped=5, lr=[3.0355555555555562e-06], mom=[[0.9, 0.999]] [2022-12-17 09:02:18,405] [INFO] [timer.py:197:stop] 0/7280, RunningAvgSamplesPerSec=6.337041606835495, CurrSamplesPerSec=5.735732689648618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:02:29,718] [INFO] [timer.py:197:stop] 0/7282, RunningAvgSamplesPerSec=6.337040061689705, CurrSamplesPerSec=5.697223322794721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:02:40,997] [INFO] [timer.py:197:stop] 0/7284, RunningAvgSamplesPerSec=6.337041604685777, CurrSamplesPerSec=5.703056753343804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:02:52,305] [INFO] [timer.py:197:stop] 0/7286, RunningAvgSamplesPerSec=6.337040531002212, CurrSamplesPerSec=5.691550965411008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:03:03,804] [INFO] [timer.py:197:stop] 0/7288, RunningAvgSamplesPerSec=6.3370375606154, CurrSamplesPerSec=5.692482013369373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:03:15,085] [INFO] [timer.py:197:stop] 0/7290, RunningAvgSamplesPerSec=6.3370364233566185, CurrSamplesPerSec=5.685198672295993, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:03:26,380] [INFO] [timer.py:197:stop] 0/7292, RunningAvgSamplesPerSec=6.337039201860143, CurrSamplesPerSec=5.705081161700345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:03:37,699] [INFO] [timer.py:197:stop] 0/7294, RunningAvgSamplesPerSec=6.337037053966658, CurrSamplesPerSec=5.6884517600571955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:03:48,993] [INFO] [timer.py:197:stop] 0/7296, RunningAvgSamplesPerSec=6.3370387233533405, CurrSamplesPerSec=5.710163659243045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:04:00,267] [INFO] [timer.py:197:stop] 0/7298, RunningAvgSamplesPerSec=6.337044139345244, CurrSamplesPerSec=5.708209685969152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:04:11,543] [INFO] [logging.py:68:log_dist] [Rank 0] step=3650, skipped=5, lr=[3.013333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 09:04:11,544] [INFO] [timer.py:197:stop] 0/7300, RunningAvgSamplesPerSec=6.337044618092546, CurrSamplesPerSec=5.696785395846913, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0005, 'learning_rate': 3.013333333333334e-06, 'epoch': 15.47} [2022-12-17 09:04:22,940] [INFO] [timer.py:197:stop] 0/7302, RunningAvgSamplesPerSec=6.337029950830808, CurrSamplesPerSec=5.638348476166631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:04:34,249] [INFO] [timer.py:197:stop] 0/7304, RunningAvgSamplesPerSec=6.33703051293742, CurrSamplesPerSec=5.69282196884546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:04:45,585] [INFO] [timer.py:197:stop] 0/7306, RunningAvgSamplesPerSec=6.337026361968943, CurrSamplesPerSec=5.6815235816639715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:04:56,900] [INFO] [timer.py:197:stop] 0/7308, RunningAvgSamplesPerSec=6.33702460748814, CurrSamplesPerSec=5.684987968011892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:05:08,312] [INFO] [timer.py:197:stop] 0/7310, RunningAvgSamplesPerSec=6.337026326779322, CurrSamplesPerSec=5.708179825763601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:05:19,627] [INFO] [timer.py:197:stop] 0/7312, RunningAvgSamplesPerSec=6.337026380581206, CurrSamplesPerSec=5.70632522123226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:05:30,935] [INFO] [timer.py:197:stop] 0/7314, RunningAvgSamplesPerSec=6.337026956602852, CurrSamplesPerSec=5.699977929305063, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:05:42,282] [INFO] [timer.py:197:stop] 0/7316, RunningAvgSamplesPerSec=6.337026503478041, CurrSamplesPerSec=5.697505798420344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:05:53,591] [INFO] [timer.py:197:stop] 0/7318, RunningAvgSamplesPerSec=6.3370269736787606, CurrSamplesPerSec=5.71353829682849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:06:04,920] [INFO] [logging.py:68:log_dist] [Rank 0] step=3660, skipped=5, lr=[2.9911111111111115e-06], mom=[[0.9, 0.999]] [2022-12-17 09:06:04,921] [INFO] [timer.py:197:stop] 0/7320, RunningAvgSamplesPerSec=6.337023615887206, CurrSamplesPerSec=5.6932969604312555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:06:16,231] [INFO] [timer.py:197:stop] 0/7322, RunningAvgSamplesPerSec=6.3370237639173626, CurrSamplesPerSec=5.698623641485544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:06:27,522] [INFO] [timer.py:197:stop] 0/7324, RunningAvgSamplesPerSec=6.337026244413775, CurrSamplesPerSec=5.7142408043634685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:06:38,895] [INFO] [timer.py:197:stop] 0/7326, RunningAvgSamplesPerSec=6.337015878739829, CurrSamplesPerSec=5.653619902831238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:06:50,192] [INFO] [timer.py:197:stop] 0/7328, RunningAvgSamplesPerSec=6.337016307579867, CurrSamplesPerSec=5.700846841961118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:07:01,492] [INFO] [timer.py:197:stop] 0/7330, RunningAvgSamplesPerSec=6.337021253894769, CurrSamplesPerSec=5.730574254077691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:07:12,835] [INFO] [timer.py:197:stop] 0/7332, RunningAvgSamplesPerSec=6.337017987007978, CurrSamplesPerSec=5.69427109805156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:07:24,340] [INFO] [timer.py:197:stop] 0/7334, RunningAvgSamplesPerSec=6.33701377064967, CurrSamplesPerSec=5.674490623191548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:07:35,692] [INFO] [timer.py:197:stop] 0/7336, RunningAvgSamplesPerSec=6.3370056649429936, CurrSamplesPerSec=5.662068594283095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:07:47,021] [INFO] [timer.py:197:stop] 0/7338, RunningAvgSamplesPerSec=6.337001658387362, CurrSamplesPerSec=5.669274554579328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:07:58,322] [INFO] [logging.py:68:log_dist] [Rank 0] step=3670, skipped=5, lr=[2.968888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 09:07:58,324] [INFO] [timer.py:197:stop] 0/7340, RunningAvgSamplesPerSec=6.337001062046699, CurrSamplesPerSec=5.712809943913745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:08:09,665] [INFO] [timer.py:197:stop] 0/7342, RunningAvgSamplesPerSec=6.336995871237044, CurrSamplesPerSec=5.688212609339006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:08:20,929] [INFO] [timer.py:197:stop] 0/7344, RunningAvgSamplesPerSec=6.337002330629014, CurrSamplesPerSec=5.729670084605206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:08:32,215] [INFO] [timer.py:197:stop] 0/7346, RunningAvgSamplesPerSec=6.337005238715324, CurrSamplesPerSec=5.716175782152351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:08:43,581] [INFO] [timer.py:197:stop] 0/7348, RunningAvgSamplesPerSec=6.336986572370934, CurrSamplesPerSec=5.61405030428149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:08:54,887] [INFO] [timer.py:197:stop] 0/7350, RunningAvgSamplesPerSec=6.336986932123496, CurrSamplesPerSec=5.699781619332736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0005, 'learning_rate': 2.957777777777778e-06, 'epoch': 15.57} [2022-12-17 09:09:06,198] [INFO] [timer.py:197:stop] 0/7352, RunningAvgSamplesPerSec=6.336985037330815, CurrSamplesPerSec=5.692481289075769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:09:17,522] [INFO] [timer.py:197:stop] 0/7354, RunningAvgSamplesPerSec=6.336985224795389, CurrSamplesPerSec=5.705982438268532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:09:28,837] [INFO] [timer.py:197:stop] 0/7356, RunningAvgSamplesPerSec=6.336984555662266, CurrSamplesPerSec=5.7004093262177955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:09:40,143] [INFO] [timer.py:197:stop] 0/7358, RunningAvgSamplesPerSec=6.336987639653612, CurrSamplesPerSec=5.696448835138831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:09:51,638] [INFO] [logging.py:68:log_dist] [Rank 0] step=3680, skipped=5, lr=[2.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 09:09:51,639] [INFO] [timer.py:197:stop] 0/7360, RunningAvgSamplesPerSec=6.336993962873559, CurrSamplesPerSec=5.725266829591983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:10:02,925] [INFO] [timer.py:197:stop] 0/7362, RunningAvgSamplesPerSec=6.336998437932194, CurrSamplesPerSec=5.7177191546152475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:10:14,232] [INFO] [timer.py:197:stop] 0/7364, RunningAvgSamplesPerSec=6.337000172759582, CurrSamplesPerSec=5.7215114782360414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:10:25,548] [INFO] [timer.py:197:stop] 0/7366, RunningAvgSamplesPerSec=6.337000912192302, CurrSamplesPerSec=5.706275487250284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:10:36,893] [INFO] [timer.py:197:stop] 0/7368, RunningAvgSamplesPerSec=6.3369977244942906, CurrSamplesPerSec=5.695682534204909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:10:48,213] [INFO] [timer.py:197:stop] 0/7370, RunningAvgSamplesPerSec=6.336995604641772, CurrSamplesPerSec=5.693942564107785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:10:59,505] [INFO] [timer.py:197:stop] 0/7372, RunningAvgSamplesPerSec=6.336996428299475, CurrSamplesPerSec=5.698745588142197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:11:10,854] [INFO] [timer.py:197:stop] 0/7374, RunningAvgSamplesPerSec=6.336987047174027, CurrSamplesPerSec=5.656158488058211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:11:22,367] [INFO] [timer.py:197:stop] 0/7376, RunningAvgSamplesPerSec=6.336986034032055, CurrSamplesPerSec=5.690582824841769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:11:33,654] [INFO] [timer.py:197:stop] 0/7378, RunningAvgSamplesPerSec=6.336989122120361, CurrSamplesPerSec=5.701948555590245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:11:44,963] [INFO] [logging.py:68:log_dist] [Rank 0] step=3690, skipped=5, lr=[2.9244444444444447e-06], mom=[[0.9, 0.999]] [2022-12-17 09:11:44,965] [INFO] [timer.py:197:stop] 0/7380, RunningAvgSamplesPerSec=6.336988774998277, CurrSamplesPerSec=5.695121113964519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:11:56,248] [INFO] [timer.py:197:stop] 0/7382, RunningAvgSamplesPerSec=6.3369942119196905, CurrSamplesPerSec=5.716110052696255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:12:07,531] [INFO] [timer.py:197:stop] 0/7384, RunningAvgSamplesPerSec=6.336997165714012, CurrSamplesPerSec=5.709153235126749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:12:18,813] [INFO] [timer.py:197:stop] 0/7386, RunningAvgSamplesPerSec=6.337000178406842, CurrSamplesPerSec=5.711433997607642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:12:30,103] [INFO] [timer.py:197:stop] 0/7388, RunningAvgSamplesPerSec=6.337001581801266, CurrSamplesPerSec=5.70525964811784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:12:41,384] [INFO] [timer.py:197:stop] 0/7390, RunningAvgSamplesPerSec=6.337001689966199, CurrSamplesPerSec=5.6848923736187045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:12:52,662] [INFO] [timer.py:197:stop] 0/7392, RunningAvgSamplesPerSec=6.337008670186226, CurrSamplesPerSec=5.735854268715347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:13:03,969] [INFO] [timer.py:197:stop] 0/7394, RunningAvgSamplesPerSec=6.337009304181113, CurrSamplesPerSec=5.689410011081015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:13:15,253] [INFO] [timer.py:197:stop] 0/7396, RunningAvgSamplesPerSec=6.337016834734171, CurrSamplesPerSec=5.726278324108146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:13:26,531] [INFO] [timer.py:197:stop] 0/7398, RunningAvgSamplesPerSec=6.337020752635934, CurrSamplesPerSec=5.7078642491937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:13:37,856] [INFO] [logging.py:68:log_dist] [Rank 0] step=3700, skipped=5, lr=[2.9022222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 09:13:37,857] [INFO] [timer.py:197:stop] 0/7400, RunningAvgSamplesPerSec=6.337018429161538, CurrSamplesPerSec=5.684988690399899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.9022222222222223e-06, 'epoch': 15.68} [2022-12-17 09:13:49,432] [INFO] [timer.py:197:stop] 0/7402, RunningAvgSamplesPerSec=6.337012024744687, CurrSamplesPerSec=5.651153310359503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:14:00,776] [INFO] [timer.py:197:stop] 0/7404, RunningAvgSamplesPerSec=6.33700586855207, CurrSamplesPerSec=5.6845610688138315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:14:12,080] [INFO] [timer.py:197:stop] 0/7406, RunningAvgSamplesPerSec=6.337005576354706, CurrSamplesPerSec=5.698727440987748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:14:23,372] [INFO] [timer.py:197:stop] 0/7408, RunningAvgSamplesPerSec=6.337009761658866, CurrSamplesPerSec=5.705296510834593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:14:34,676] [INFO] [timer.py:197:stop] 0/7410, RunningAvgSamplesPerSec=6.337011033623096, CurrSamplesPerSec=5.712568740747563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:14:46,006] [INFO] [timer.py:197:stop] 0/7412, RunningAvgSamplesPerSec=6.337008244689118, CurrSamplesPerSec=5.688408605366097, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:14:57,348] [INFO] [timer.py:197:stop] 0/7414, RunningAvgSamplesPerSec=6.337007150563991, CurrSamplesPerSec=5.701993853875586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:15:08,690] [INFO] [timer.py:197:stop] 0/7416, RunningAvgSamplesPerSec=6.337000523489746, CurrSamplesPerSec=5.6616062023435445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:15:20,019] [INFO] [timer.py:197:stop] 0/7418, RunningAvgSamplesPerSec=6.336998625385598, CurrSamplesPerSec=5.6962121542639474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:15:31,362] [INFO] [logging.py:68:log_dist] [Rank 0] step=3710, skipped=5, lr=[2.88e-06], mom=[[0.9, 0.999]] [2022-12-17 09:15:31,364] [INFO] [timer.py:197:stop] 0/7420, RunningAvgSamplesPerSec=6.336993049210901, CurrSamplesPerSec=5.698096959630452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:15:42,685] [INFO] [timer.py:197:stop] 0/7422, RunningAvgSamplesPerSec=6.336989874096959, CurrSamplesPerSec=5.682819945807793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:15:54,010] [INFO] [timer.py:197:stop] 0/7424, RunningAvgSamplesPerSec=6.33698803333303, CurrSamplesPerSec=5.687665193832233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:16:05,350] [INFO] [timer.py:197:stop] 0/7426, RunningAvgSamplesPerSec=6.3369833530157305, CurrSamplesPerSec=5.68367689558089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:16:16,687] [INFO] [timer.py:197:stop] 0/7428, RunningAvgSamplesPerSec=6.3369800625802, CurrSamplesPerSec=5.6919600869033085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:16:27,978] [INFO] [timer.py:197:stop] 0/7430, RunningAvgSamplesPerSec=6.336984477677266, CurrSamplesPerSec=5.716490330696599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:16:39,280] [INFO] [timer.py:197:stop] 0/7432, RunningAvgSamplesPerSec=6.336985455005384, CurrSamplesPerSec=5.721349533270731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:16:50,560] [INFO] [timer.py:197:stop] 0/7434, RunningAvgSamplesPerSec=6.336990721761715, CurrSamplesPerSec=5.732729166972771, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:17:01,850] [INFO] [timer.py:197:stop] 0/7436, RunningAvgSamplesPerSec=6.336993886526433, CurrSamplesPerSec=5.714456358964536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:17:13,123] [INFO] [timer.py:197:stop] 0/7438, RunningAvgSamplesPerSec=6.336996949363353, CurrSamplesPerSec=5.700044740715134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:17:24,419] [INFO] [logging.py:68:log_dist] [Rank 0] step=3720, skipped=5, lr=[2.8577777777777784e-06], mom=[[0.9, 0.999]] [2022-12-17 09:17:24,420] [INFO] [timer.py:197:stop] 0/7440, RunningAvgSamplesPerSec=6.337000812652505, CurrSamplesPerSec=5.7189904204649755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:17:35,676] [INFO] [timer.py:197:stop] 0/7442, RunningAvgSamplesPerSec=6.337010934063454, CurrSamplesPerSec=5.741173982588318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:17:46,934] [INFO] [timer.py:197:stop] 0/7444, RunningAvgSamplesPerSec=6.337018496823416, CurrSamplesPerSec=5.733557148901786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:17:58,194] [INFO] [timer.py:197:stop] 0/7446, RunningAvgSamplesPerSec=6.337025551190863, CurrSamplesPerSec=5.741582655421268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:18:09,415] [INFO] [timer.py:197:stop] 0/7448, RunningAvgSamplesPerSec=6.3370357472866425, CurrSamplesPerSec=5.736453169862878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:18:20,674] [INFO] [timer.py:197:stop] 0/7450, RunningAvgSamplesPerSec=6.337047038533585, CurrSamplesPerSec=5.741463535036417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.8466666666666672e-06, 'epoch': 15.78} [2022-12-17 09:18:31,935] [INFO] [timer.py:197:stop] 0/7452, RunningAvgSamplesPerSec=6.3370536854770965, CurrSamplesPerSec=5.724098962654761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:18:43,232] [INFO] [timer.py:197:stop] 0/7454, RunningAvgSamplesPerSec=6.3370561754818295, CurrSamplesPerSec=5.678187846450097, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:18:54,530] [INFO] [timer.py:197:stop] 0/7456, RunningAvgSamplesPerSec=6.337061318202993, CurrSamplesPerSec=5.726379468785904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:19:05,795] [INFO] [timer.py:197:stop] 0/7458, RunningAvgSamplesPerSec=6.3370713169833355, CurrSamplesPerSec=5.745115777043273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:19:17,067] [INFO] [logging.py:68:log_dist] [Rank 0] step=3730, skipped=5, lr=[2.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 09:19:17,069] [INFO] [timer.py:197:stop] 0/7460, RunningAvgSamplesPerSec=6.337077577755489, CurrSamplesPerSec=5.742890357244891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:19:28,385] [INFO] [timer.py:197:stop] 0/7462, RunningAvgSamplesPerSec=6.337077798063677, CurrSamplesPerSec=5.698841890975853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:19:39,613] [INFO] [timer.py:197:stop] 0/7464, RunningAvgSamplesPerSec=6.3370883772767055, CurrSamplesPerSec=5.72395420281479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:19:50,877] [INFO] [timer.py:197:stop] 0/7466, RunningAvgSamplesPerSec=6.337093881465289, CurrSamplesPerSec=5.739011490489101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:20:02,139] [INFO] [timer.py:197:stop] 0/7468, RunningAvgSamplesPerSec=6.337099770288439, CurrSamplesPerSec=5.720503373606978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:20:13,435] [INFO] [timer.py:197:stop] 0/7470, RunningAvgSamplesPerSec=6.337101437823006, CurrSamplesPerSec=5.701043467475638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:20:24,750] [INFO] [timer.py:197:stop] 0/7472, RunningAvgSamplesPerSec=6.337101653412334, CurrSamplesPerSec=5.689918926821241, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:20:36,064] [INFO] [timer.py:197:stop] 0/7474, RunningAvgSamplesPerSec=6.337101541849059, CurrSamplesPerSec=5.70705410546408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:20:47,417] [INFO] [timer.py:197:stop] 0/7476, RunningAvgSamplesPerSec=6.337097632975739, CurrSamplesPerSec=5.686130051900902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:20:58,700] [INFO] [timer.py:197:stop] 0/7478, RunningAvgSamplesPerSec=6.337102461458638, CurrSamplesPerSec=5.719335499291697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:21:09,975] [INFO] [logging.py:68:log_dist] [Rank 0] step=3740, skipped=5, lr=[2.8133333333333336e-06], mom=[[0.9, 0.999]] [2022-12-17 09:21:09,977] [INFO] [timer.py:197:stop] 0/7480, RunningAvgSamplesPerSec=6.337110095747103, CurrSamplesPerSec=5.7376940261095575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:21:21,284] [INFO] [timer.py:197:stop] 0/7482, RunningAvgSamplesPerSec=6.3371124749097545, CurrSamplesPerSec=5.720568716500787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:21:32,583] [INFO] [timer.py:197:stop] 0/7484, RunningAvgSamplesPerSec=6.33711631173176, CurrSamplesPerSec=5.715392239377726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:21:43,871] [INFO] [timer.py:197:stop] 0/7486, RunningAvgSamplesPerSec=6.337121037137045, CurrSamplesPerSec=5.723318371519071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:21:55,141] [INFO] [timer.py:197:stop] 0/7488, RunningAvgSamplesPerSec=6.3371289155506805, CurrSamplesPerSec=5.741047266531711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:22:06,422] [INFO] [timer.py:197:stop] 0/7490, RunningAvgSamplesPerSec=6.337135404090728, CurrSamplesPerSec=5.734627439020378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:22:17,699] [INFO] [timer.py:197:stop] 0/7492, RunningAvgSamplesPerSec=6.3371398018959875, CurrSamplesPerSec=5.733295577748234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:22:28,964] [INFO] [timer.py:197:stop] 0/7494, RunningAvgSamplesPerSec=6.337148466188466, CurrSamplesPerSec=5.7207518308463285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:22:40,250] [INFO] [timer.py:197:stop] 0/7496, RunningAvgSamplesPerSec=6.337152944739803, CurrSamplesPerSec=5.720642350953582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:22:51,556] [INFO] [timer.py:197:stop] 0/7498, RunningAvgSamplesPerSec=6.337156582906381, CurrSamplesPerSec=5.714792374357171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:23:02,854] [INFO] [logging.py:68:log_dist] [Rank 0] step=3750, skipped=5, lr=[2.7911111111111113e-06], mom=[[0.9, 0.999]] [2022-12-17 09:23:02,856] [INFO] [timer.py:197:stop] 0/7500, RunningAvgSamplesPerSec=6.337159180592687, CurrSamplesPerSec=5.709706009399266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.7911111111111113e-06, 'epoch': 15.89} [2022-12-17 09:23:14,199] [INFO] [timer.py:197:stop] 0/7502, RunningAvgSamplesPerSec=6.3371583100622, CurrSamplesPerSec=5.702817826897301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:23:25,538] [INFO] [timer.py:197:stop] 0/7504, RunningAvgSamplesPerSec=6.337153460940718, CurrSamplesPerSec=5.684490286311387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:23:36,852] [INFO] [timer.py:197:stop] 0/7506, RunningAvgSamplesPerSec=6.337152546984216, CurrSamplesPerSec=5.708428427570143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:23:48,184] [INFO] [timer.py:197:stop] 0/7508, RunningAvgSamplesPerSec=6.337149343188377, CurrSamplesPerSec=5.678220036147123, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:23:59,494] [INFO] [timer.py:197:stop] 0/7510, RunningAvgSamplesPerSec=6.3371502493584835, CurrSamplesPerSec=5.700620690961581, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:24:10,828] [INFO] [timer.py:197:stop] 0/7512, RunningAvgSamplesPerSec=6.337148006115891, CurrSamplesPerSec=5.699207532626838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:24:22,127] [INFO] [timer.py:197:stop] 0/7514, RunningAvgSamplesPerSec=6.337147584269954, CurrSamplesPerSec=5.698130826917132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:24:33,498] [INFO] [timer.py:197:stop] 0/7516, RunningAvgSamplesPerSec=6.337147803798282, CurrSamplesPerSec=5.695418848667418, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:24:44,813] [INFO] [timer.py:197:stop] 0/7518, RunningAvgSamplesPerSec=6.337146998144934, CurrSamplesPerSec=5.694066967722256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:24:56,096] [INFO] [logging.py:68:log_dist] [Rank 0] step=3760, skipped=5, lr=[2.7688888888888893e-06], mom=[[0.9, 0.999]] [2022-12-17 09:24:56,098] [INFO] [timer.py:197:stop] 0/7520, RunningAvgSamplesPerSec=6.33715050800144, CurrSamplesPerSec=5.697948432141898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:25:07,381] [INFO] [timer.py:197:stop] 0/7522, RunningAvgSamplesPerSec=6.337155380512161, CurrSamplesPerSec=5.716553147168683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:25:18,691] [INFO] [timer.py:197:stop] 0/7524, RunningAvgSamplesPerSec=6.337157343363257, CurrSamplesPerSec=5.6871434272953865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:25:30,005] [INFO] [timer.py:197:stop] 0/7526, RunningAvgSamplesPerSec=6.337156740513725, CurrSamplesPerSec=5.693342604366141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:25:41,332] [INFO] [timer.py:197:stop] 0/7528, RunningAvgSamplesPerSec=6.337153983110774, CurrSamplesPerSec=5.688243225294438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:25:52,613] [INFO] [timer.py:197:stop] 0/7530, RunningAvgSamplesPerSec=6.337159190948649, CurrSamplesPerSec=5.726155440575434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:26:04,062] [INFO] [timer.py:197:stop] 0/7532, RunningAvgSamplesPerSec=6.337157782718983, CurrSamplesPerSec=5.69249263636342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:26:15,334] [INFO] [timer.py:197:stop] 0/7534, RunningAvgSamplesPerSec=6.337161263495991, CurrSamplesPerSec=5.708837065279713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:26:26,628] [INFO] [timer.py:197:stop] 0/7536, RunningAvgSamplesPerSec=6.337161522577037, CurrSamplesPerSec=5.694562462074853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:26:37,945] [INFO] [timer.py:197:stop] 0/7538, RunningAvgSamplesPerSec=6.33716089579459, CurrSamplesPerSec=5.689047554579456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:26:49,241] [INFO] [logging.py:68:log_dist] [Rank 0] step=3770, skipped=5, lr=[2.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 09:26:49,242] [INFO] [timer.py:197:stop] 0/7540, RunningAvgSamplesPerSec=6.33716380996003, CurrSamplesPerSec=5.709197919404089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:27:00,533] [INFO] [timer.py:197:stop] 0/7542, RunningAvgSamplesPerSec=6.337165489777445, CurrSamplesPerSec=5.713306516974537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:27:11,827] [INFO] [timer.py:197:stop] 0/7544, RunningAvgSamplesPerSec=6.337169139932689, CurrSamplesPerSec=5.712809457596547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:27:23,133] [INFO] [timer.py:197:stop] 0/7546, RunningAvgSamplesPerSec=6.337170620485615, CurrSamplesPerSec=5.73317018880441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:27:34,433] [INFO] [timer.py:197:stop] 0/7548, RunningAvgSamplesPerSec=6.337173591365556, CurrSamplesPerSec=5.7136450726788635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:27:45,742] [INFO] [timer.py:197:stop] 0/7550, RunningAvgSamplesPerSec=6.33717364908794, CurrSamplesPerSec=5.71643944549684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.7355555555555557e-06, 'epoch': 16.0} [2022-12-17 09:27:54,445] [INFO] [timer.py:197:stop] 0/7552, RunningAvgSamplesPerSec=6.337586792581298, CurrSamplesPerSec=10.26715855822542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:28:05,728] [INFO] [timer.py:197:stop] 0/7554, RunningAvgSamplesPerSec=6.337591516051901, CurrSamplesPerSec=5.717267355524845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:28:17,013] [INFO] [timer.py:197:stop] 0/7556, RunningAvgSamplesPerSec=6.337596250854084, CurrSamplesPerSec=5.703340534797298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:28:28,297] [INFO] [timer.py:197:stop] 0/7558, RunningAvgSamplesPerSec=6.337600728695917, CurrSamplesPerSec=5.724571863274335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:28:39,600] [INFO] [logging.py:68:log_dist] [Rank 0] step=3780, skipped=5, lr=[2.7244444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 09:28:39,601] [INFO] [timer.py:197:stop] 0/7560, RunningAvgSamplesPerSec=6.337601870922737, CurrSamplesPerSec=5.703217906724676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:28:50,900] [INFO] [timer.py:197:stop] 0/7562, RunningAvgSamplesPerSec=6.337602270427081, CurrSamplesPerSec=5.701695431361674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:29:02,201] [INFO] [timer.py:197:stop] 0/7564, RunningAvgSamplesPerSec=6.337604562714723, CurrSamplesPerSec=5.7021777190112815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:29:13,539] [INFO] [timer.py:197:stop] 0/7566, RunningAvgSamplesPerSec=6.3376015213372705, CurrSamplesPerSec=5.695463801578938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:29:24,829] [INFO] [timer.py:197:stop] 0/7568, RunningAvgSamplesPerSec=6.337605068679528, CurrSamplesPerSec=5.717699424969985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:29:36,142] [INFO] [timer.py:197:stop] 0/7570, RunningAvgSamplesPerSec=6.3376098008097355, CurrSamplesPerSec=5.752060729281249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:29:47,468] [INFO] [timer.py:197:stop] 0/7572, RunningAvgSamplesPerSec=6.337607647820765, CurrSamplesPerSec=5.700316601843686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:29:58,780] [INFO] [timer.py:197:stop] 0/7574, RunningAvgSamplesPerSec=6.337607350186396, CurrSamplesPerSec=5.697775966126229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:30:10,113] [INFO] [timer.py:197:stop] 0/7576, RunningAvgSamplesPerSec=6.337607612379269, CurrSamplesPerSec=5.7026789873583805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:30:21,506] [INFO] [timer.py:197:stop] 0/7578, RunningAvgSamplesPerSec=6.337595988367939, CurrSamplesPerSec=5.628938079626455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:30:32,852] [INFO] [logging.py:68:log_dist] [Rank 0] step=3790, skipped=5, lr=[2.702222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 09:30:32,854] [INFO] [timer.py:197:stop] 0/7580, RunningAvgSamplesPerSec=6.337594479460954, CurrSamplesPerSec=5.707124965658659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:30:44,120] [INFO] [timer.py:197:stop] 0/7582, RunningAvgSamplesPerSec=6.337595931550774, CurrSamplesPerSec=5.7019829531529265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:30:55,441] [INFO] [timer.py:197:stop] 0/7584, RunningAvgSamplesPerSec=6.337594274854377, CurrSamplesPerSec=5.700950238170126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:31:06,750] [INFO] [timer.py:197:stop] 0/7586, RunningAvgSamplesPerSec=6.337596159526293, CurrSamplesPerSec=5.719249225680997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:31:18,069] [INFO] [timer.py:197:stop] 0/7588, RunningAvgSamplesPerSec=6.337597267773341, CurrSamplesPerSec=5.703781651700679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:31:29,385] [INFO] [timer.py:197:stop] 0/7590, RunningAvgSamplesPerSec=6.33759872193673, CurrSamplesPerSec=5.701558584081337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:31:40,639] [INFO] [timer.py:197:stop] 0/7592, RunningAvgSamplesPerSec=6.337606926848249, CurrSamplesPerSec=5.742255717046693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:31:51,898] [INFO] [timer.py:197:stop] 0/7594, RunningAvgSamplesPerSec=6.337610414269546, CurrSamplesPerSec=5.730849769348819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:32:03,166] [INFO] [timer.py:197:stop] 0/7596, RunningAvgSamplesPerSec=6.3376154535103, CurrSamplesPerSec=5.706872837563713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:32:14,441] [INFO] [timer.py:197:stop] 0/7598, RunningAvgSamplesPerSec=6.337619653048147, CurrSamplesPerSec=5.705567660454909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:32:25,749] [INFO] [logging.py:68:log_dist] [Rank 0] step=3800, skipped=5, lr=[2.68e-06], mom=[[0.9, 0.999]] [2022-12-17 09:32:25,750] [INFO] [timer.py:197:stop] 0/7600, RunningAvgSamplesPerSec=6.337619378832625, CurrSamplesPerSec=5.702209454509983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.68e-06, 'epoch': 16.1} [2022-12-17 09:32:37,105] [INFO] [timer.py:197:stop] 0/7602, RunningAvgSamplesPerSec=6.337616036998575, CurrSamplesPerSec=5.679462020335036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:32:48,383] [INFO] [timer.py:197:stop] 0/7604, RunningAvgSamplesPerSec=6.337618781863973, CurrSamplesPerSec=5.723297138899976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:32:59,664] [INFO] [timer.py:197:stop] 0/7606, RunningAvgSamplesPerSec=6.337624220363036, CurrSamplesPerSec=5.727346386999702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:33:10,968] [INFO] [timer.py:197:stop] 0/7608, RunningAvgSamplesPerSec=6.337626727971518, CurrSamplesPerSec=5.71430113854971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:33:22,282] [INFO] [timer.py:197:stop] 0/7610, RunningAvgSamplesPerSec=6.337628377283658, CurrSamplesPerSec=5.720003598608851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:33:33,565] [INFO] [timer.py:197:stop] 0/7612, RunningAvgSamplesPerSec=6.337630500529432, CurrSamplesPerSec=5.716670992646742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:33:44,873] [INFO] [timer.py:197:stop] 0/7614, RunningAvgSamplesPerSec=6.337631753825161, CurrSamplesPerSec=5.704467699806805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:33:56,157] [INFO] [timer.py:197:stop] 0/7616, RunningAvgSamplesPerSec=6.337635793398005, CurrSamplesPerSec=5.718958497903887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:34:07,617] [INFO] [timer.py:197:stop] 0/7618, RunningAvgSamplesPerSec=6.337639489002845, CurrSamplesPerSec=5.716124902552305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:34:18,908] [INFO] [logging.py:68:log_dist] [Rank 0] step=3810, skipped=5, lr=[2.6577777777777782e-06], mom=[[0.9, 0.999]] [2022-12-17 09:34:18,910] [INFO] [timer.py:197:stop] 0/7620, RunningAvgSamplesPerSec=6.337641366147274, CurrSamplesPerSec=5.704450001085912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:34:30,171] [INFO] [timer.py:197:stop] 0/7622, RunningAvgSamplesPerSec=6.337645186424332, CurrSamplesPerSec=5.700668631537744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:34:41,463] [INFO] [timer.py:197:stop] 0/7624, RunningAvgSamplesPerSec=6.337648983931281, CurrSamplesPerSec=5.7186176066698735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:34:52,784] [INFO] [timer.py:197:stop] 0/7626, RunningAvgSamplesPerSec=6.337647700728763, CurrSamplesPerSec=5.692582691973853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:35:04,083] [INFO] [timer.py:197:stop] 0/7628, RunningAvgSamplesPerSec=6.337648358748499, CurrSamplesPerSec=5.692907929928287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:35:15,386] [INFO] [timer.py:197:stop] 0/7630, RunningAvgSamplesPerSec=6.337647574349222, CurrSamplesPerSec=5.688455617491631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:35:26,690] [INFO] [timer.py:197:stop] 0/7632, RunningAvgSamplesPerSec=6.337649406499528, CurrSamplesPerSec=5.729970708860434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:35:37,991] [INFO] [timer.py:197:stop] 0/7634, RunningAvgSamplesPerSec=6.337650791547396, CurrSamplesPerSec=5.705103471891794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:35:49,289] [INFO] [timer.py:197:stop] 0/7636, RunningAvgSamplesPerSec=6.3376497688644635, CurrSamplesPerSec=5.694385610703284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:36:00,627] [INFO] [timer.py:197:stop] 0/7638, RunningAvgSamplesPerSec=6.337650054425621, CurrSamplesPerSec=5.708650342395094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:36:11,924] [INFO] [logging.py:68:log_dist] [Rank 0] step=3820, skipped=5, lr=[2.635555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 09:36:11,926] [INFO] [timer.py:197:stop] 0/7640, RunningAvgSamplesPerSec=6.337651283414945, CurrSamplesPerSec=5.7059324676879415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:36:23,223] [INFO] [timer.py:197:stop] 0/7642, RunningAvgSamplesPerSec=6.337653160494952, CurrSamplesPerSec=5.708989803483139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:36:34,508] [INFO] [timer.py:197:stop] 0/7644, RunningAvgSamplesPerSec=6.337658097402001, CurrSamplesPerSec=5.720769874705262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:36:45,822] [INFO] [timer.py:197:stop] 0/7646, RunningAvgSamplesPerSec=6.337658170056963, CurrSamplesPerSec=5.7108541594545805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:36:57,126] [INFO] [timer.py:197:stop] 0/7648, RunningAvgSamplesPerSec=6.337659849422356, CurrSamplesPerSec=5.709940412176869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:37:08,413] [INFO] [timer.py:197:stop] 0/7650, RunningAvgSamplesPerSec=6.337664868797306, CurrSamplesPerSec=5.725524981982122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.6244444444444446e-06, 'epoch': 16.21} [2022-12-17 09:37:19,685] [INFO] [timer.py:197:stop] 0/7652, RunningAvgSamplesPerSec=6.337667092967805, CurrSamplesPerSec=5.708157248742415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:37:31,018] [INFO] [timer.py:197:stop] 0/7654, RunningAvgSamplesPerSec=6.3376648265658115, CurrSamplesPerSec=5.675200838364229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:37:42,363] [INFO] [timer.py:197:stop] 0/7656, RunningAvgSamplesPerSec=6.337659805455543, CurrSamplesPerSec=5.673781065449453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:37:53,671] [INFO] [timer.py:197:stop] 0/7658, RunningAvgSamplesPerSec=6.33765906744692, CurrSamplesPerSec=5.708310436122384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:38:04,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=3830, skipped=5, lr=[2.6133333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 09:38:04,934] [INFO] [timer.py:197:stop] 0/7660, RunningAvgSamplesPerSec=6.337663550102813, CurrSamplesPerSec=5.717441247099751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:38:16,207] [INFO] [timer.py:197:stop] 0/7662, RunningAvgSamplesPerSec=6.3376692239885655, CurrSamplesPerSec=5.717891855717364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:38:27,511] [INFO] [timer.py:197:stop] 0/7664, RunningAvgSamplesPerSec=6.337673040340945, CurrSamplesPerSec=5.71155503484099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:38:38,957] [INFO] [timer.py:197:stop] 0/7666, RunningAvgSamplesPerSec=6.337682342427288, CurrSamplesPerSec=5.738175554205899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:38:50,284] [INFO] [timer.py:197:stop] 0/7668, RunningAvgSamplesPerSec=6.3376767349966565, CurrSamplesPerSec=5.668561273086992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:39:01,591] [INFO] [timer.py:197:stop] 0/7670, RunningAvgSamplesPerSec=6.337678125002861, CurrSamplesPerSec=5.710234110798286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:39:12,845] [INFO] [timer.py:197:stop] 0/7672, RunningAvgSamplesPerSec=6.337686190629664, CurrSamplesPerSec=5.736208004868038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:39:24,113] [INFO] [timer.py:197:stop] 0/7674, RunningAvgSamplesPerSec=6.337692827180317, CurrSamplesPerSec=5.722016639575232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:39:35,394] [INFO] [timer.py:197:stop] 0/7676, RunningAvgSamplesPerSec=6.337697913741528, CurrSamplesPerSec=5.71689793192338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:39:46,712] [INFO] [timer.py:197:stop] 0/7678, RunningAvgSamplesPerSec=6.337697349513364, CurrSamplesPerSec=5.699035474357622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:39:57,972] [INFO] [logging.py:68:log_dist] [Rank 0] step=3840, skipped=5, lr=[2.5911111111111115e-06], mom=[[0.9, 0.999]] [2022-12-17 09:39:57,974] [INFO] [timer.py:197:stop] 0/7680, RunningAvgSamplesPerSec=6.3377044241616645, CurrSamplesPerSec=5.7285742595547315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:40:09,260] [INFO] [timer.py:197:stop] 0/7682, RunningAvgSamplesPerSec=6.337711552344175, CurrSamplesPerSec=5.7456888196643385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:40:20,570] [INFO] [timer.py:197:stop] 0/7684, RunningAvgSamplesPerSec=6.337709510302982, CurrSamplesPerSec=5.69719889770791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:40:32,111] [INFO] [timer.py:197:stop] 0/7686, RunningAvgSamplesPerSec=6.337710909695745, CurrSamplesPerSec=5.723016492646554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:40:43,391] [INFO] [timer.py:197:stop] 0/7688, RunningAvgSamplesPerSec=6.337714796469835, CurrSamplesPerSec=5.713155979588083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:40:54,674] [INFO] [timer.py:197:stop] 0/7690, RunningAvgSamplesPerSec=6.33771834844587, CurrSamplesPerSec=5.721193206541384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:41:05,945] [INFO] [timer.py:197:stop] 0/7692, RunningAvgSamplesPerSec=6.337724939606327, CurrSamplesPerSec=5.731976574601302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:41:17,229] [INFO] [timer.py:197:stop] 0/7694, RunningAvgSamplesPerSec=6.337727410771224, CurrSamplesPerSec=5.7024446958230435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:41:28,486] [INFO] [timer.py:197:stop] 0/7696, RunningAvgSamplesPerSec=6.33772906309757, CurrSamplesPerSec=5.68873963547455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:41:39,728] [INFO] [timer.py:197:stop] 0/7698, RunningAvgSamplesPerSec=6.3377401774351805, CurrSamplesPerSec=5.753686191763893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:41:50,987] [INFO] [logging.py:68:log_dist] [Rank 0] step=3850, skipped=5, lr=[2.568888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 09:41:50,988] [INFO] [timer.py:197:stop] 0/7700, RunningAvgSamplesPerSec=6.337747795509742, CurrSamplesPerSec=5.752716277848127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.568888888888889e-06, 'epoch': 16.31} [2022-12-17 09:42:02,271] [INFO] [timer.py:197:stop] 0/7702, RunningAvgSamplesPerSec=6.337754141135462, CurrSamplesPerSec=5.726597894844452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:42:13,528] [INFO] [timer.py:197:stop] 0/7704, RunningAvgSamplesPerSec=6.337759938911315, CurrSamplesPerSec=5.719015032759193, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:42:24,801] [INFO] [timer.py:197:stop] 0/7706, RunningAvgSamplesPerSec=6.3377667375846105, CurrSamplesPerSec=5.7230045352871075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:42:36,099] [INFO] [timer.py:197:stop] 0/7708, RunningAvgSamplesPerSec=6.3377724764819705, CurrSamplesPerSec=5.71709298748041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:42:47,308] [INFO] [timer.py:197:stop] 0/7710, RunningAvgSamplesPerSec=6.337785431762316, CurrSamplesPerSec=5.740814968288539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:42:58,557] [INFO] [timer.py:197:stop] 0/7712, RunningAvgSamplesPerSec=6.337794374665492, CurrSamplesPerSec=5.733962778338685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:43:09,834] [INFO] [timer.py:197:stop] 0/7714, RunningAvgSamplesPerSec=6.3377989705497155, CurrSamplesPerSec=5.716045541922188, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:43:21,082] [INFO] [timer.py:197:stop] 0/7716, RunningAvgSamplesPerSec=6.337809588204668, CurrSamplesPerSec=5.740360248559839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:43:32,353] [INFO] [timer.py:197:stop] 0/7718, RunningAvgSamplesPerSec=6.337817725751959, CurrSamplesPerSec=5.728971114518445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:43:43,607] [INFO] [logging.py:68:log_dist] [Rank 0] step=3860, skipped=5, lr=[2.5466666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 09:43:43,609] [INFO] [timer.py:197:stop] 0/7720, RunningAvgSamplesPerSec=6.337825494644621, CurrSamplesPerSec=5.721367824787967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:43:54,934] [INFO] [timer.py:197:stop] 0/7722, RunningAvgSamplesPerSec=6.337822737480995, CurrSamplesPerSec=5.683201341972973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:44:06,208] [INFO] [timer.py:197:stop] 0/7724, RunningAvgSamplesPerSec=6.337829661991261, CurrSamplesPerSec=5.716317227330523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:44:17,509] [INFO] [timer.py:197:stop] 0/7726, RunningAvgSamplesPerSec=6.337831393646679, CurrSamplesPerSec=5.712366943264196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:44:28,735] [INFO] [timer.py:197:stop] 0/7728, RunningAvgSamplesPerSec=6.337842682733193, CurrSamplesPerSec=5.748709634408516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:44:40,031] [INFO] [timer.py:197:stop] 0/7730, RunningAvgSamplesPerSec=6.337845932926097, CurrSamplesPerSec=5.708473100502343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:44:51,331] [INFO] [timer.py:197:stop] 0/7732, RunningAvgSamplesPerSec=6.337849432167574, CurrSamplesPerSec=5.712740158242432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:45:02,541] [INFO] [timer.py:197:stop] 0/7734, RunningAvgSamplesPerSec=6.337862880216422, CurrSamplesPerSec=5.74519471732774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:45:13,775] [INFO] [timer.py:197:stop] 0/7736, RunningAvgSamplesPerSec=6.337872213835417, CurrSamplesPerSec=5.744988641020949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:45:25,099] [INFO] [timer.py:197:stop] 0/7738, RunningAvgSamplesPerSec=6.337871356149751, CurrSamplesPerSec=5.6864791268958745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:45:36,327] [INFO] [logging.py:68:log_dist] [Rank 0] step=3870, skipped=5, lr=[2.5244444444444447e-06], mom=[[0.9, 0.999]] [2022-12-17 09:45:36,328] [INFO] [timer.py:197:stop] 0/7740, RunningAvgSamplesPerSec=6.337880462359573, CurrSamplesPerSec=5.733127822295487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:45:47,545] [INFO] [timer.py:197:stop] 0/7742, RunningAvgSamplesPerSec=6.337888450710517, CurrSamplesPerSec=5.737256477188804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:45:58,848] [INFO] [timer.py:197:stop] 0/7744, RunningAvgSamplesPerSec=6.337887695526293, CurrSamplesPerSec=5.702028251984804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:46:10,119] [INFO] [timer.py:197:stop] 0/7746, RunningAvgSamplesPerSec=6.33789452116026, CurrSamplesPerSec=5.7230445560333365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:46:21,408] [INFO] [timer.py:197:stop] 0/7748, RunningAvgSamplesPerSec=6.337898703797568, CurrSamplesPerSec=5.709751916821519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:46:32,706] [INFO] [timer.py:197:stop] 0/7750, RunningAvgSamplesPerSec=6.337900434978689, CurrSamplesPerSec=5.708300482334463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.5133333333333336e-06, 'epoch': 16.42} [2022-12-17 09:46:43,980] [INFO] [timer.py:197:stop] 0/7752, RunningAvgSamplesPerSec=6.337906537766826, CurrSamplesPerSec=5.7340281841099365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:46:55,267] [INFO] [timer.py:197:stop] 0/7754, RunningAvgSamplesPerSec=6.337911639304656, CurrSamplesPerSec=5.730378032888709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:47:06,521] [INFO] [timer.py:197:stop] 0/7756, RunningAvgSamplesPerSec=6.337919004617038, CurrSamplesPerSec=5.733004154853874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:47:17,798] [INFO] [timer.py:197:stop] 0/7758, RunningAvgSamplesPerSec=6.337926206357161, CurrSamplesPerSec=5.724206621853158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:47:29,117] [INFO] [logging.py:68:log_dist] [Rank 0] step=3880, skipped=5, lr=[2.5022222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 09:47:29,119] [INFO] [timer.py:197:stop] 0/7760, RunningAvgSamplesPerSec=6.337930726765982, CurrSamplesPerSec=5.728880882035514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:47:40,387] [INFO] [timer.py:197:stop] 0/7762, RunningAvgSamplesPerSec=6.337939116628576, CurrSamplesPerSec=5.725962453227796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:47:51,660] [INFO] [timer.py:197:stop] 0/7764, RunningAvgSamplesPerSec=6.33794288021509, CurrSamplesPerSec=5.707184178951615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:48:02,949] [INFO] [timer.py:197:stop] 0/7766, RunningAvgSamplesPerSec=6.337948337593651, CurrSamplesPerSec=5.722645837736495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:48:14,247] [INFO] [timer.py:197:stop] 0/7768, RunningAvgSamplesPerSec=6.337953765039836, CurrSamplesPerSec=5.714051538449913, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:48:25,521] [INFO] [timer.py:197:stop] 0/7770, RunningAvgSamplesPerSec=6.33796009831342, CurrSamplesPerSec=5.717910855911447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:48:36,804] [INFO] [timer.py:197:stop] 0/7772, RunningAvgSamplesPerSec=6.337966152970789, CurrSamplesPerSec=5.726004713901616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:48:48,058] [INFO] [timer.py:197:stop] 0/7774, RunningAvgSamplesPerSec=6.337976220513363, CurrSamplesPerSec=5.741714553242598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:48:59,319] [INFO] [timer.py:197:stop] 0/7776, RunningAvgSamplesPerSec=6.337985161512416, CurrSamplesPerSec=5.738442723091509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:49:10,550] [INFO] [timer.py:197:stop] 0/7778, RunningAvgSamplesPerSec=6.337993629409863, CurrSamplesPerSec=5.737967528077485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:49:21,821] [INFO] [logging.py:68:log_dist] [Rank 0] step=3890, skipped=5, lr=[2.4800000000000004e-06], mom=[[0.9, 0.999]] [2022-12-17 09:49:21,822] [INFO] [timer.py:197:stop] 0/7780, RunningAvgSamplesPerSec=6.33799976467172, CurrSamplesPerSec=5.717746922494803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:49:33,124] [INFO] [timer.py:197:stop] 0/7782, RunningAvgSamplesPerSec=6.338001283847643, CurrSamplesPerSec=5.689798322443387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:49:44,402] [INFO] [timer.py:197:stop] 0/7784, RunningAvgSamplesPerSec=6.33800723289743, CurrSamplesPerSec=5.7221383697188966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:49:55,649] [INFO] [timer.py:197:stop] 0/7786, RunningAvgSamplesPerSec=6.3380153543891415, CurrSamplesPerSec=5.725017003230027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:50:06,914] [INFO] [timer.py:197:stop] 0/7788, RunningAvgSamplesPerSec=6.3380232131434076, CurrSamplesPerSec=5.736539963293558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:50:18,235] [INFO] [timer.py:197:stop] 0/7790, RunningAvgSamplesPerSec=6.338021456688827, CurrSamplesPerSec=5.689594512665814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:50:29,538] [INFO] [timer.py:197:stop] 0/7792, RunningAvgSamplesPerSec=6.338022973657751, CurrSamplesPerSec=5.690050391416976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:50:40,864] [INFO] [timer.py:197:stop] 0/7794, RunningAvgSamplesPerSec=6.338024076398991, CurrSamplesPerSec=5.691871257887523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:50:52,147] [INFO] [timer.py:197:stop] 0/7796, RunningAvgSamplesPerSec=6.338026648126305, CurrSamplesPerSec=5.698377344512056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:51:03,459] [INFO] [timer.py:197:stop] 0/7798, RunningAvgSamplesPerSec=6.33802682372121, CurrSamplesPerSec=5.704365145730466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:51:14,724] [INFO] [logging.py:68:log_dist] [Rank 0] step=3900, skipped=5, lr=[2.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 09:51:14,726] [INFO] [timer.py:197:stop] 0/7800, RunningAvgSamplesPerSec=6.338033175561862, CurrSamplesPerSec=5.723243447922379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.457777777777778e-06, 'epoch': 16.53} [2022-12-17 09:51:26,011] [INFO] [timer.py:197:stop] 0/7802, RunningAvgSamplesPerSec=6.338037345556911, CurrSamplesPerSec=5.70654624468292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:51:37,311] [INFO] [timer.py:197:stop] 0/7804, RunningAvgSamplesPerSec=6.338038981952448, CurrSamplesPerSec=5.70911413695785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:51:48,585] [INFO] [timer.py:197:stop] 0/7806, RunningAvgSamplesPerSec=6.3380445896973985, CurrSamplesPerSec=5.721638552918464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:51:59,954] [INFO] [timer.py:197:stop] 0/7808, RunningAvgSamplesPerSec=6.338035527632987, CurrSamplesPerSec=5.628274561626016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:52:11,222] [INFO] [timer.py:197:stop] 0/7810, RunningAvgSamplesPerSec=6.3380415482884045, CurrSamplesPerSec=5.719435667681271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:52:22,498] [INFO] [timer.py:197:stop] 0/7812, RunningAvgSamplesPerSec=6.338044134123395, CurrSamplesPerSec=5.724492512077532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:52:33,735] [INFO] [timer.py:197:stop] 0/7814, RunningAvgSamplesPerSec=6.338052879421057, CurrSamplesPerSec=5.733898598845058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:52:44,969] [INFO] [timer.py:197:stop] 0/7816, RunningAvgSamplesPerSec=6.338062080626535, CurrSamplesPerSec=5.728037382499521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:52:56,251] [INFO] [timer.py:197:stop] 0/7818, RunningAvgSamplesPerSec=6.33806568143163, CurrSamplesPerSec=5.7105181210758404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:53:07,526] [INFO] [logging.py:68:log_dist] [Rank 0] step=3910, skipped=5, lr=[2.4355555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 09:53:07,528] [INFO] [timer.py:197:stop] 0/7820, RunningAvgSamplesPerSec=6.33807126366113, CurrSamplesPerSec=5.728249334232319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:53:18,806] [INFO] [timer.py:197:stop] 0/7822, RunningAvgSamplesPerSec=6.338075961797504, CurrSamplesPerSec=5.7384353627338704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:53:30,101] [INFO] [timer.py:197:stop] 0/7824, RunningAvgSamplesPerSec=6.338077998092185, CurrSamplesPerSec=5.723391344595013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:53:41,400] [INFO] [timer.py:197:stop] 0/7826, RunningAvgSamplesPerSec=6.338081091758629, CurrSamplesPerSec=5.7249855017470415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:53:52,680] [INFO] [timer.py:197:stop] 0/7828, RunningAvgSamplesPerSec=6.338085376762099, CurrSamplesPerSec=5.725090752243677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:54:03,987] [INFO] [timer.py:197:stop] 0/7830, RunningAvgSamplesPerSec=6.33808473099834, CurrSamplesPerSec=5.700445399961691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:54:15,313] [INFO] [timer.py:197:stop] 0/7832, RunningAvgSamplesPerSec=6.338078814876594, CurrSamplesPerSec=5.667699780297962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:54:26,688] [INFO] [timer.py:197:stop] 0/7834, RunningAvgSamplesPerSec=6.338073448259471, CurrSamplesPerSec=5.684280356739311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:54:38,001] [INFO] [timer.py:197:stop] 0/7836, RunningAvgSamplesPerSec=6.338072425133514, CurrSamplesPerSec=5.704623356301539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:54:49,313] [INFO] [timer.py:197:stop] 0/7838, RunningAvgSamplesPerSec=6.338070988951457, CurrSamplesPerSec=5.680800482440264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:55:00,628] [INFO] [logging.py:68:log_dist] [Rank 0] step=3920, skipped=5, lr=[2.4133333333333337e-06], mom=[[0.9, 0.999]] [2022-12-17 09:55:00,630] [INFO] [timer.py:197:stop] 0/7840, RunningAvgSamplesPerSec=6.338069360016892, CurrSamplesPerSec=5.7083890965455835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:55:11,939] [INFO] [timer.py:197:stop] 0/7842, RunningAvgSamplesPerSec=6.338068903567028, CurrSamplesPerSec=5.700775169038363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:55:23,211] [INFO] [timer.py:197:stop] 0/7844, RunningAvgSamplesPerSec=6.338071817909554, CurrSamplesPerSec=5.725986392723651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:55:34,467] [INFO] [timer.py:197:stop] 0/7846, RunningAvgSamplesPerSec=6.338080453163421, CurrSamplesPerSec=5.7490021637294015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:55:45,759] [INFO] [timer.py:197:stop] 0/7848, RunningAvgSamplesPerSec=6.338083687906425, CurrSamplesPerSec=5.71646500971841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:55:57,056] [INFO] [timer.py:197:stop] 0/7850, RunningAvgSamplesPerSec=6.338086576282932, CurrSamplesPerSec=5.7301632325805985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.4022222222222225e-06, 'epoch': 16.63} [2022-12-17 09:56:08,361] [INFO] [timer.py:197:stop] 0/7852, RunningAvgSamplesPerSec=6.33808753398256, CurrSamplesPerSec=5.713277819413265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:56:19,624] [INFO] [timer.py:197:stop] 0/7854, RunningAvgSamplesPerSec=6.33809495047022, CurrSamplesPerSec=5.737719290220738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:56:30,898] [INFO] [timer.py:197:stop] 0/7856, RunningAvgSamplesPerSec=6.338100536942322, CurrSamplesPerSec=5.726581524528675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:56:42,177] [INFO] [timer.py:197:stop] 0/7858, RunningAvgSamplesPerSec=6.338105385584027, CurrSamplesPerSec=5.7167106813704445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:56:53,435] [INFO] [logging.py:68:log_dist] [Rank 0] step=3930, skipped=5, lr=[2.3911111111111113e-06], mom=[[0.9, 0.999]] [2022-12-17 09:56:53,437] [INFO] [timer.py:197:stop] 0/7860, RunningAvgSamplesPerSec=6.338108012024771, CurrSamplesPerSec=5.7049114159581595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:57:04,775] [INFO] [timer.py:197:stop] 0/7862, RunningAvgSamplesPerSec=6.338111543072177, CurrSamplesPerSec=5.713287060969322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:57:16,102] [INFO] [timer.py:197:stop] 0/7864, RunningAvgSamplesPerSec=6.338110406074132, CurrSamplesPerSec=5.695581504230151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:57:27,393] [INFO] [timer.py:197:stop] 0/7866, RunningAvgSamplesPerSec=6.338113753744858, CurrSamplesPerSec=5.711950993677978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:57:38,762] [INFO] [timer.py:197:stop] 0/7868, RunningAvgSamplesPerSec=6.338113012863938, CurrSamplesPerSec=5.679170036729898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:57:50,071] [INFO] [timer.py:197:stop] 0/7870, RunningAvgSamplesPerSec=6.3381116004514855, CurrSamplesPerSec=5.700562098019049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:58:01,489] [INFO] [timer.py:197:stop] 0/7872, RunningAvgSamplesPerSec=6.338111905656464, CurrSamplesPerSec=5.707745795611867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:58:12,762] [INFO] [timer.py:197:stop] 0/7874, RunningAvgSamplesPerSec=6.338112775738269, CurrSamplesPerSec=5.715166393060473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:58:24,035] [INFO] [timer.py:197:stop] 0/7876, RunningAvgSamplesPerSec=6.338115497792184, CurrSamplesPerSec=5.718213901715182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:58:35,415] [INFO] [timer.py:197:stop] 0/7878, RunningAvgSamplesPerSec=6.338118462172044, CurrSamplesPerSec=5.709587722247295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:58:46,800] [INFO] [logging.py:68:log_dist] [Rank 0] step=3940, skipped=5, lr=[2.3688888888888893e-06], mom=[[0.9, 0.999]] [2022-12-17 09:58:46,802] [INFO] [timer.py:197:stop] 0/7880, RunningAvgSamplesPerSec=6.3381177443984775, CurrSamplesPerSec=5.683592897785402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:58:58,116] [INFO] [timer.py:197:stop] 0/7882, RunningAvgSamplesPerSec=6.338118981307474, CurrSamplesPerSec=5.693691116312811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:59:09,381] [INFO] [timer.py:197:stop] 0/7884, RunningAvgSamplesPerSec=6.338126813691846, CurrSamplesPerSec=5.74215327376515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:59:20,662] [INFO] [timer.py:197:stop] 0/7886, RunningAvgSamplesPerSec=6.338132665058401, CurrSamplesPerSec=5.708553464983033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:59:31,934] [INFO] [timer.py:197:stop] 0/7888, RunningAvgSamplesPerSec=6.3381354312682605, CurrSamplesPerSec=5.701538481405924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:59:43,231] [INFO] [timer.py:197:stop] 0/7890, RunningAvgSamplesPerSec=6.33813710920125, CurrSamplesPerSec=5.703827706300002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 09:59:54,487] [INFO] [timer.py:197:stop] 0/7892, RunningAvgSamplesPerSec=6.338145160230165, CurrSamplesPerSec=5.736119995746772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:00:05,845] [INFO] [timer.py:197:stop] 0/7894, RunningAvgSamplesPerSec=6.338134902893071, CurrSamplesPerSec=5.653856153556985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:00:17,120] [INFO] [timer.py:197:stop] 0/7896, RunningAvgSamplesPerSec=6.33814083501362, CurrSamplesPerSec=5.713846717230511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:00:28,404] [INFO] [timer.py:197:stop] 0/7898, RunningAvgSamplesPerSec=6.338142276531903, CurrSamplesPerSec=5.710470743574421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:00:39,716] [INFO] [logging.py:68:log_dist] [Rank 0] step=3950, skipped=5, lr=[2.346666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 10:00:39,717] [INFO] [timer.py:197:stop] 0/7900, RunningAvgSamplesPerSec=6.338144709674223, CurrSamplesPerSec=5.706045509182841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.346666666666667e-06, 'epoch': 16.74} [2022-12-17 10:00:51,009] [INFO] [timer.py:197:stop] 0/7902, RunningAvgSamplesPerSec=6.338149983020205, CurrSamplesPerSec=5.717792715727425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:01:02,335] [INFO] [timer.py:197:stop] 0/7904, RunningAvgSamplesPerSec=6.338149141416174, CurrSamplesPerSec=5.701027485092423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:01:13,708] [INFO] [timer.py:197:stop] 0/7906, RunningAvgSamplesPerSec=6.33814149997887, CurrSamplesPerSec=5.634362080092918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:01:25,007] [INFO] [timer.py:197:stop] 0/7908, RunningAvgSamplesPerSec=6.338143007536694, CurrSamplesPerSec=5.710456408998133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:01:36,262] [INFO] [timer.py:197:stop] 0/7910, RunningAvgSamplesPerSec=6.338148563510418, CurrSamplesPerSec=5.737118653290545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:01:47,491] [INFO] [timer.py:197:stop] 0/7912, RunningAvgSamplesPerSec=6.338160609062336, CurrSamplesPerSec=5.749935600230909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:01:58,735] [INFO] [timer.py:197:stop] 0/7914, RunningAvgSamplesPerSec=6.338165702227535, CurrSamplesPerSec=5.708400750125987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:02:09,952] [INFO] [timer.py:197:stop] 0/7916, RunningAvgSamplesPerSec=6.338175063089888, CurrSamplesPerSec=5.747496251131897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:02:21,213] [INFO] [timer.py:197:stop] 0/7918, RunningAvgSamplesPerSec=6.3381825449010805, CurrSamplesPerSec=5.728961577617624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:02:32,519] [INFO] [logging.py:68:log_dist] [Rank 0] step=3960, skipped=5, lr=[2.3244444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 10:02:32,521] [INFO] [timer.py:197:stop] 0/7920, RunningAvgSamplesPerSec=6.338182967900777, CurrSamplesPerSec=5.685274770482056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:02:43,763] [INFO] [timer.py:197:stop] 0/7922, RunningAvgSamplesPerSec=6.3381882149312325, CurrSamplesPerSec=5.7287571532247386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:02:55,040] [INFO] [timer.py:197:stop] 0/7924, RunningAvgSamplesPerSec=6.338192504869388, CurrSamplesPerSec=5.718937541301829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:03:06,297] [INFO] [timer.py:197:stop] 0/7926, RunningAvgSamplesPerSec=6.338198398399984, CurrSamplesPerSec=5.70923726157512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:03:17,576] [INFO] [timer.py:197:stop] 0/7928, RunningAvgSamplesPerSec=6.338203036589526, CurrSamplesPerSec=5.726441036860908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:03:28,838] [INFO] [timer.py:197:stop] 0/7930, RunningAvgSamplesPerSec=6.33820765643813, CurrSamplesPerSec=5.714428379740114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:03:40,116] [INFO] [timer.py:197:stop] 0/7932, RunningAvgSamplesPerSec=6.338211739957927, CurrSamplesPerSec=5.719892928542276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:03:51,385] [INFO] [timer.py:197:stop] 0/7934, RunningAvgSamplesPerSec=6.338217423452864, CurrSamplesPerSec=5.711872235060416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:04:02,675] [INFO] [timer.py:197:stop] 0/7936, RunningAvgSamplesPerSec=6.338219974967238, CurrSamplesPerSec=5.700439589394914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:04:13,955] [INFO] [timer.py:197:stop] 0/7938, RunningAvgSamplesPerSec=6.338221058392447, CurrSamplesPerSec=5.6929115519410205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:04:25,215] [INFO] [logging.py:68:log_dist] [Rank 0] step=3970, skipped=5, lr=[2.302222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 10:04:25,217] [INFO] [timer.py:197:stop] 0/7940, RunningAvgSamplesPerSec=6.33822818491925, CurrSamplesPerSec=5.719371325538942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:04:36,505] [INFO] [timer.py:197:stop] 0/7942, RunningAvgSamplesPerSec=6.3382306066497565, CurrSamplesPerSec=5.705062974173377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:04:47,796] [INFO] [timer.py:197:stop] 0/7944, RunningAvgSamplesPerSec=6.338232647273031, CurrSamplesPerSec=5.712902345683689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:04:59,081] [INFO] [timer.py:197:stop] 0/7946, RunningAvgSamplesPerSec=6.338235937628303, CurrSamplesPerSec=5.716276813708189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:05:10,371] [INFO] [timer.py:197:stop] 0/7948, RunningAvgSamplesPerSec=6.338235903439613, CurrSamplesPerSec=5.702641673870506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:05:21,748] [INFO] [timer.py:197:stop] 0/7950, RunningAvgSamplesPerSec=6.338230165437593, CurrSamplesPerSec=5.684356430343305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.2911111111111114e-06, 'epoch': 16.84} [2022-12-17 10:05:33,034] [INFO] [timer.py:197:stop] 0/7952, RunningAvgSamplesPerSec=6.3382333553649195, CurrSamplesPerSec=5.722647789710324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:05:44,320] [INFO] [timer.py:197:stop] 0/7954, RunningAvgSamplesPerSec=6.338236371939146, CurrSamplesPerSec=5.698118247592226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:05:55,639] [INFO] [timer.py:197:stop] 0/7956, RunningAvgSamplesPerSec=6.338234088343676, CurrSamplesPerSec=5.708029801166079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:06:06,899] [INFO] [timer.py:197:stop] 0/7958, RunningAvgSamplesPerSec=6.338239428620746, CurrSamplesPerSec=5.725561130059419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:06:18,199] [INFO] [logging.py:68:log_dist] [Rank 0] step=3980, skipped=5, lr=[2.28e-06], mom=[[0.9, 0.999]] [2022-12-17 10:06:18,200] [INFO] [timer.py:197:stop] 0/7960, RunningAvgSamplesPerSec=6.338242054976651, CurrSamplesPerSec=5.724104821554558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:06:29,479] [INFO] [timer.py:197:stop] 0/7962, RunningAvgSamplesPerSec=6.338247887723113, CurrSamplesPerSec=5.722088847612678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:06:40,729] [INFO] [timer.py:197:stop] 0/7964, RunningAvgSamplesPerSec=6.338257991239867, CurrSamplesPerSec=5.744080655070806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:06:51,967] [INFO] [timer.py:197:stop] 0/7966, RunningAvgSamplesPerSec=6.338269079031337, CurrSamplesPerSec=5.737984699474484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:07:03,226] [INFO] [timer.py:197:stop] 0/7968, RunningAvgSamplesPerSec=6.338276621632127, CurrSamplesPerSec=5.748057952642601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:07:14,518] [INFO] [timer.py:197:stop] 0/7970, RunningAvgSamplesPerSec=6.338279152405919, CurrSamplesPerSec=5.720762803449684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:07:25,805] [INFO] [timer.py:197:stop] 0/7972, RunningAvgSamplesPerSec=6.33828324681685, CurrSamplesPerSec=5.7059846214691925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:07:37,059] [INFO] [timer.py:197:stop] 0/7974, RunningAvgSamplesPerSec=6.338292838997039, CurrSamplesPerSec=5.754532575012987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:07:48,363] [INFO] [timer.py:197:stop] 0/7976, RunningAvgSamplesPerSec=6.338298034828886, CurrSamplesPerSec=5.725873536852861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:07:59,624] [INFO] [timer.py:197:stop] 0/7978, RunningAvgSamplesPerSec=6.338305476156251, CurrSamplesPerSec=5.74477913706752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:08:10,871] [INFO] [logging.py:68:log_dist] [Rank 0] step=3990, skipped=5, lr=[2.257777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 10:08:10,873] [INFO] [timer.py:197:stop] 0/7980, RunningAvgSamplesPerSec=6.338315193684523, CurrSamplesPerSec=5.74299135241991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:08:22,111] [INFO] [timer.py:197:stop] 0/7982, RunningAvgSamplesPerSec=6.338326127329389, CurrSamplesPerSec=5.745274397571756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:08:33,411] [INFO] [timer.py:197:stop] 0/7984, RunningAvgSamplesPerSec=6.3383327882824245, CurrSamplesPerSec=5.714127924469036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:08:44,654] [INFO] [timer.py:197:stop] 0/7986, RunningAvgSamplesPerSec=6.338340140310366, CurrSamplesPerSec=5.730535351302664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:08:55,959] [INFO] [timer.py:197:stop] 0/7988, RunningAvgSamplesPerSec=6.33834081377446, CurrSamplesPerSec=5.7012589962436975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:09:07,169] [INFO] [timer.py:197:stop] 0/7990, RunningAvgSamplesPerSec=6.338350998917928, CurrSamplesPerSec=5.740558872668938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:09:18,394] [INFO] [timer.py:197:stop] 0/7992, RunningAvgSamplesPerSec=6.338359022801522, CurrSamplesPerSec=5.732135204831851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:09:29,644] [INFO] [timer.py:197:stop] 0/7994, RunningAvgSamplesPerSec=6.338368416212484, CurrSamplesPerSec=5.744597923555065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:09:40,900] [INFO] [timer.py:197:stop] 0/7996, RunningAvgSamplesPerSec=6.33837681122492, CurrSamplesPerSec=5.742089402106264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:09:52,156] [INFO] [timer.py:197:stop] 0/7998, RunningAvgSamplesPerSec=6.33838585503386, CurrSamplesPerSec=5.741089504595729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:10:03,430] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=5, lr=[2.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 10:10:03,432] [INFO] [timer.py:197:stop] 0/8000, RunningAvgSamplesPerSec=6.338391427625841, CurrSamplesPerSec=5.722572395688769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.235555555555556e-06, 'epoch': 16.95} {'eval_loss': 0.20751953125, 'eval_wer': 9.117695806707058, 'eval_runtime': 2104.1262, 'eval_samples_per_second': 3.666, 'eval_steps_per_second': 0.459, 'epoch': 16.95} [2022-12-17 10:45:11,140] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step4000 is begin to save! [2022-12-17 10:45:11,149] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-4000/global_step4000/mp_rank_00_model_states.pt [2022-12-17 10:45:11,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-4000/global_step4000/mp_rank_00_model_states.pt... [2022-12-17 10:45:14,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-4000/global_step4000/mp_rank_00_model_states.pt. [2022-12-17 10:45:14,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-17 10:45:29,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-17 10:45:29,453] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-4000/global_step4000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-17 10:45:29,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! [2022-12-17 10:47:43,406] [INFO] [timer.py:197:stop] 0/8002, RunningAvgSamplesPerSec=6.338349624432332, CurrSamplesPerSec=5.420656353971379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:47:54,680] [INFO] [timer.py:197:stop] 0/8004, RunningAvgSamplesPerSec=6.33835143065796, CurrSamplesPerSec=5.704160291062447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:48:06,006] [INFO] [timer.py:197:stop] 0/8006, RunningAvgSamplesPerSec=6.338344500819827, CurrSamplesPerSec=5.702807649947566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:48:17,251] [INFO] [timer.py:197:stop] 0/8008, RunningAvgSamplesPerSec=6.338348296361045, CurrSamplesPerSec=5.70014399244045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:48:28,495] [INFO] [timer.py:197:stop] 0/8010, RunningAvgSamplesPerSec=6.3383562678162235, CurrSamplesPerSec=5.714695531689848, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:48:39,732] [INFO] [timer.py:197:stop] 0/8012, RunningAvgSamplesPerSec=6.33836165307344, CurrSamplesPerSec=5.715190485728729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:48:50,957] [INFO] [timer.py:197:stop] 0/8014, RunningAvgSamplesPerSec=6.338372401860596, CurrSamplesPerSec=5.731869846856297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:49:02,219] [INFO] [timer.py:197:stop] 0/8016, RunningAvgSamplesPerSec=6.338376282487288, CurrSamplesPerSec=5.72263388192585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:49:13,499] [INFO] [timer.py:197:stop] 0/8018, RunningAvgSamplesPerSec=6.338378207077379, CurrSamplesPerSec=5.699282312244174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:49:24,759] [INFO] [logging.py:68:log_dist] [Rank 0] step=4010, skipped=5, lr=[2.2133333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 10:49:24,761] [INFO] [timer.py:197:stop] 0/8020, RunningAvgSamplesPerSec=6.338380369110319, CurrSamplesPerSec=5.700426273557393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:49:36,071] [INFO] [timer.py:197:stop] 0/8022, RunningAvgSamplesPerSec=6.338378251809593, CurrSamplesPerSec=5.692219348309159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:49:44,558] [INFO] [timer.py:197:stop] 0/8024, RunningAvgSamplesPerSec=6.338766498759093, CurrSamplesPerSec=10.175358309906693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:49:55,831] [INFO] [timer.py:197:stop] 0/8026, RunningAvgSamplesPerSec=6.338767264312376, CurrSamplesPerSec=5.688479003299899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:50:07,070] [INFO] [timer.py:197:stop] 0/8028, RunningAvgSamplesPerSec=6.338773488763729, CurrSamplesPerSec=5.724551109671964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:50:18,301] [INFO] [timer.py:197:stop] 0/8030, RunningAvgSamplesPerSec=6.338779926930835, CurrSamplesPerSec=5.722825424903103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:50:29,539] [INFO] [timer.py:197:stop] 0/8032, RunningAvgSamplesPerSec=6.33878562512091, CurrSamplesPerSec=5.724370925896205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:50:40,772] [INFO] [timer.py:197:stop] 0/8034, RunningAvgSamplesPerSec=6.338794929840726, CurrSamplesPerSec=5.743888915707282, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:50:52,047] [INFO] [timer.py:197:stop] 0/8036, RunningAvgSamplesPerSec=6.338797520759499, CurrSamplesPerSec=5.707314258368477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:51:03,314] [INFO] [timer.py:197:stop] 0/8038, RunningAvgSamplesPerSec=6.338802433467725, CurrSamplesPerSec=5.713592292317813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:51:14,578] [INFO] [logging.py:68:log_dist] [Rank 0] step=4020, skipped=5, lr=[2.1911111111111115e-06], mom=[[0.9, 0.999]] [2022-12-17 10:51:14,580] [INFO] [timer.py:197:stop] 0/8040, RunningAvgSamplesPerSec=6.338805872784326, CurrSamplesPerSec=5.714824493755569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:51:25,902] [INFO] [timer.py:197:stop] 0/8042, RunningAvgSamplesPerSec=6.338803333984638, CurrSamplesPerSec=5.678797831205772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:51:37,186] [INFO] [timer.py:197:stop] 0/8044, RunningAvgSamplesPerSec=6.338804543158869, CurrSamplesPerSec=5.717682131195379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:51:48,453] [INFO] [timer.py:197:stop] 0/8046, RunningAvgSamplesPerSec=6.338806879460148, CurrSamplesPerSec=5.693272568998906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:51:59,734] [INFO] [timer.py:197:stop] 0/8048, RunningAvgSamplesPerSec=6.338810120073018, CurrSamplesPerSec=5.700095576532893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:52:11,006] [INFO] [timer.py:197:stop] 0/8050, RunningAvgSamplesPerSec=6.338812691972016, CurrSamplesPerSec=5.702963458180577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 2.1800000000000003e-06, 'epoch': 17.06} [2022-12-17 10:52:22,315] [INFO] [timer.py:197:stop] 0/8052, RunningAvgSamplesPerSec=6.338810008624044, CurrSamplesPerSec=5.671229755707918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:52:33,585] [INFO] [timer.py:197:stop] 0/8054, RunningAvgSamplesPerSec=6.338810694836544, CurrSamplesPerSec=5.713058462833503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:52:44,839] [INFO] [timer.py:197:stop] 0/8056, RunningAvgSamplesPerSec=6.338814151111343, CurrSamplesPerSec=5.7207464664777845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:52:56,121] [INFO] [timer.py:197:stop] 0/8058, RunningAvgSamplesPerSec=6.3388157502265425, CurrSamplesPerSec=5.721475381275096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:53:07,427] [INFO] [logging.py:68:log_dist] [Rank 0] step=4030, skipped=5, lr=[2.168888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 10:53:07,429] [INFO] [timer.py:197:stop] 0/8060, RunningAvgSamplesPerSec=6.338812634700473, CurrSamplesPerSec=5.691154933906094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:53:18,719] [INFO] [timer.py:197:stop] 0/8062, RunningAvgSamplesPerSec=6.338812535242969, CurrSamplesPerSec=5.710401257994357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:53:30,002] [INFO] [timer.py:197:stop] 0/8064, RunningAvgSamplesPerSec=6.338814281507109, CurrSamplesPerSec=5.702056109912625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:53:41,286] [INFO] [timer.py:197:stop] 0/8066, RunningAvgSamplesPerSec=6.33881451595564, CurrSamplesPerSec=5.683698075932731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:53:52,607] [INFO] [timer.py:197:stop] 0/8068, RunningAvgSamplesPerSec=6.338811931154509, CurrSamplesPerSec=5.694774601571177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:54:03,876] [INFO] [timer.py:197:stop] 0/8070, RunningAvgSamplesPerSec=6.3388150547450985, CurrSamplesPerSec=5.7143904257518585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:54:15,345] [INFO] [timer.py:197:stop] 0/8072, RunningAvgSamplesPerSec=6.338817286845124, CurrSamplesPerSec=5.707756232922582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:54:26,606] [INFO] [timer.py:197:stop] 0/8074, RunningAvgSamplesPerSec=6.338822134613462, CurrSamplesPerSec=5.726801676459754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:54:37,842] [INFO] [timer.py:197:stop] 0/8076, RunningAvgSamplesPerSec=6.338828168085962, CurrSamplesPerSec=5.734015445790293, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:54:49,098] [INFO] [timer.py:197:stop] 0/8078, RunningAvgSamplesPerSec=6.338831253328031, CurrSamplesPerSec=5.718133508620822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:55:00,394] [INFO] [logging.py:68:log_dist] [Rank 0] step=4040, skipped=5, lr=[2.1466666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 10:55:00,395] [INFO] [timer.py:197:stop] 0/8080, RunningAvgSamplesPerSec=6.338832635103738, CurrSamplesPerSec=5.707157241604528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:55:11,720] [INFO] [timer.py:197:stop] 0/8082, RunningAvgSamplesPerSec=6.338828706895809, CurrSamplesPerSec=5.695046685152615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:55:22,963] [INFO] [timer.py:197:stop] 0/8084, RunningAvgSamplesPerSec=6.338832031374013, CurrSamplesPerSec=5.717097614428014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:55:34,239] [INFO] [timer.py:197:stop] 0/8086, RunningAvgSamplesPerSec=6.338835473233115, CurrSamplesPerSec=5.721566844054441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:55:45,473] [INFO] [timer.py:197:stop] 0/8088, RunningAvgSamplesPerSec=6.338840393624562, CurrSamplesPerSec=5.726270750616645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:55:56,756] [INFO] [timer.py:197:stop] 0/8090, RunningAvgSamplesPerSec=6.338841210521914, CurrSamplesPerSec=5.708622662798983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:56:08,018] [INFO] [timer.py:197:stop] 0/8092, RunningAvgSamplesPerSec=6.338846009986815, CurrSamplesPerSec=5.719069862889136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:56:19,320] [INFO] [timer.py:197:stop] 0/8094, RunningAvgSamplesPerSec=6.3388423110670855, CurrSamplesPerSec=5.697636888234431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:56:30,592] [INFO] [timer.py:197:stop] 0/8096, RunningAvgSamplesPerSec=6.3388448983325505, CurrSamplesPerSec=5.708167202030818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:56:41,883] [INFO] [timer.py:197:stop] 0/8098, RunningAvgSamplesPerSec=6.338842357992624, CurrSamplesPerSec=5.695432141121127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:56:53,150] [INFO] [logging.py:68:log_dist] [Rank 0] step=4050, skipped=5, lr=[2.1244444444444443e-06], mom=[[0.9, 0.999]] [2022-12-17 10:56:53,152] [INFO] [timer.py:197:stop] 0/8100, RunningAvgSamplesPerSec=6.338845647073789, CurrSamplesPerSec=5.717637557582573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.1244444444444443e-06, 'epoch': 17.16} [2022-12-17 10:57:04,428] [INFO] [timer.py:197:stop] 0/8102, RunningAvgSamplesPerSec=6.338848160233028, CurrSamplesPerSec=5.732973789802865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:57:15,740] [INFO] [timer.py:197:stop] 0/8104, RunningAvgSamplesPerSec=6.338846048444852, CurrSamplesPerSec=5.697548365669088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:57:27,008] [INFO] [timer.py:197:stop] 0/8106, RunningAvgSamplesPerSec=6.3388497642169925, CurrSamplesPerSec=5.718600794644177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:57:38,278] [INFO] [timer.py:197:stop] 0/8108, RunningAvgSamplesPerSec=6.338852471241099, CurrSamplesPerSec=5.73913419021571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:57:49,561] [INFO] [timer.py:197:stop] 0/8110, RunningAvgSamplesPerSec=6.338853272107325, CurrSamplesPerSec=5.7126093450966895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:58:00,798] [INFO] [timer.py:197:stop] 0/8112, RunningAvgSamplesPerSec=6.338861996910889, CurrSamplesPerSec=5.7258691399609205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:58:12,086] [INFO] [timer.py:197:stop] 0/8114, RunningAvgSamplesPerSec=6.3388632967592065, CurrSamplesPerSec=5.715007970574066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:58:23,350] [INFO] [timer.py:197:stop] 0/8116, RunningAvgSamplesPerSec=6.338868725063488, CurrSamplesPerSec=5.731447870570084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:58:34,647] [INFO] [timer.py:197:stop] 0/8118, RunningAvgSamplesPerSec=6.338865962296623, CurrSamplesPerSec=5.6913957801372606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:58:45,904] [INFO] [logging.py:68:log_dist] [Rank 0] step=4060, skipped=5, lr=[2.1022222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 10:58:45,906] [INFO] [timer.py:197:stop] 0/8120, RunningAvgSamplesPerSec=6.338870234253123, CurrSamplesPerSec=5.701580382323306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:58:57,197] [INFO] [timer.py:197:stop] 0/8122, RunningAvgSamplesPerSec=6.338871197907672, CurrSamplesPerSec=5.69734617690224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:59:08,469] [INFO] [timer.py:197:stop] 0/8124, RunningAvgSamplesPerSec=6.338874408163104, CurrSamplesPerSec=5.712023677168763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:59:19,768] [INFO] [timer.py:197:stop] 0/8126, RunningAvgSamplesPerSec=6.338873435771249, CurrSamplesPerSec=5.6937959439141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:59:31,053] [INFO] [timer.py:197:stop] 0/8128, RunningAvgSamplesPerSec=6.338875103479378, CurrSamplesPerSec=5.706707110055808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:59:42,333] [INFO] [timer.py:197:stop] 0/8130, RunningAvgSamplesPerSec=6.338876937697654, CurrSamplesPerSec=5.723553161097039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 10:59:53,604] [INFO] [timer.py:197:stop] 0/8132, RunningAvgSamplesPerSec=6.33888008476501, CurrSamplesPerSec=5.7310925187864905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:00:04,854] [INFO] [timer.py:197:stop] 0/8134, RunningAvgSamplesPerSec=6.338886804346215, CurrSamplesPerSec=5.73368304465005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:00:16,108] [INFO] [timer.py:197:stop] 0/8136, RunningAvgSamplesPerSec=6.338893059648211, CurrSamplesPerSec=5.728716563514304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:00:27,360] [INFO] [timer.py:197:stop] 0/8138, RunningAvgSamplesPerSec=6.338897099643674, CurrSamplesPerSec=5.70207137132762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:00:38,615] [INFO] [logging.py:68:log_dist] [Rank 0] step=4070, skipped=5, lr=[2.08e-06], mom=[[0.9, 0.999]] [2022-12-17 11:00:38,617] [INFO] [timer.py:197:stop] 0/8140, RunningAvgSamplesPerSec=6.338901736596038, CurrSamplesPerSec=5.715131836254571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:00:49,916] [INFO] [timer.py:197:stop] 0/8142, RunningAvgSamplesPerSec=6.338901148214492, CurrSamplesPerSec=5.674539084926821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:01:01,172] [INFO] [timer.py:197:stop] 0/8144, RunningAvgSamplesPerSec=6.338908038076983, CurrSamplesPerSec=5.732348194412151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:01:12,415] [INFO] [timer.py:197:stop] 0/8146, RunningAvgSamplesPerSec=6.338913779001088, CurrSamplesPerSec=5.734613962983128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:01:23,683] [INFO] [timer.py:197:stop] 0/8148, RunningAvgSamplesPerSec=6.338917733451932, CurrSamplesPerSec=5.725038492812677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:01:34,947] [INFO] [timer.py:197:stop] 0/8150, RunningAvgSamplesPerSec=6.338923103422685, CurrSamplesPerSec=5.71891390438847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.0688888888888892e-06, 'epoch': 17.27} [2022-12-17 11:01:46,198] [INFO] [timer.py:197:stop] 0/8152, RunningAvgSamplesPerSec=6.338929776354999, CurrSamplesPerSec=5.732722800691356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:01:57,467] [INFO] [timer.py:197:stop] 0/8154, RunningAvgSamplesPerSec=6.338933563760563, CurrSamplesPerSec=5.711915503414802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:02:08,758] [INFO] [timer.py:197:stop] 0/8156, RunningAvgSamplesPerSec=6.338935132196591, CurrSamplesPerSec=5.678368257595655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:02:20,051] [INFO] [timer.py:197:stop] 0/8158, RunningAvgSamplesPerSec=6.338935200431204, CurrSamplesPerSec=5.697341581869769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:02:31,296] [INFO] [logging.py:68:log_dist] [Rank 0] step=4080, skipped=5, lr=[2.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 11:02:31,298] [INFO] [timer.py:197:stop] 0/8160, RunningAvgSamplesPerSec=6.338942180948522, CurrSamplesPerSec=5.748176362106808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:02:42,577] [INFO] [timer.py:197:stop] 0/8162, RunningAvgSamplesPerSec=6.338944245421692, CurrSamplesPerSec=5.714502829412263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:02:53,889] [INFO] [timer.py:197:stop] 0/8164, RunningAvgSamplesPerSec=6.338941353879358, CurrSamplesPerSec=5.699720623180004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:03:05,171] [INFO] [timer.py:197:stop] 0/8166, RunningAvgSamplesPerSec=6.338942619512358, CurrSamplesPerSec=5.715760494764467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:03:16,451] [INFO] [timer.py:197:stop] 0/8168, RunningAvgSamplesPerSec=6.33894495321946, CurrSamplesPerSec=5.719275546167675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:03:27,745] [INFO] [timer.py:197:stop] 0/8170, RunningAvgSamplesPerSec=6.3389440244462625, CurrSamplesPerSec=5.709750945228217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:03:39,039] [INFO] [timer.py:197:stop] 0/8172, RunningAvgSamplesPerSec=6.338941089178579, CurrSamplesPerSec=5.703327690107661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:03:50,320] [INFO] [timer.py:197:stop] 0/8174, RunningAvgSamplesPerSec=6.33894121579318, CurrSamplesPerSec=5.728734657651518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:04:01,599] [INFO] [timer.py:197:stop] 0/8176, RunningAvgSamplesPerSec=6.338943687836185, CurrSamplesPerSec=5.71622909715484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:04:12,834] [INFO] [timer.py:197:stop] 0/8178, RunningAvgSamplesPerSec=6.3389499263487465, CurrSamplesPerSec=5.733750648539528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:04:24,098] [INFO] [logging.py:68:log_dist] [Rank 0] step=4090, skipped=5, lr=[2.0355555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 11:04:24,100] [INFO] [timer.py:197:stop] 0/8180, RunningAvgSamplesPerSec=6.338953945781538, CurrSamplesPerSec=5.720305891098483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:04:35,394] [INFO] [timer.py:197:stop] 0/8182, RunningAvgSamplesPerSec=6.338951432192114, CurrSamplesPerSec=5.683273536283522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:04:46,666] [INFO] [timer.py:197:stop] 0/8184, RunningAvgSamplesPerSec=6.338951766298605, CurrSamplesPerSec=5.697556105237196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:04:57,934] [INFO] [timer.py:197:stop] 0/8186, RunningAvgSamplesPerSec=6.338953882850323, CurrSamplesPerSec=5.713337160451459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:05:09,215] [INFO] [timer.py:197:stop] 0/8188, RunningAvgSamplesPerSec=6.33895551009071, CurrSamplesPerSec=5.703192703189134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:05:20,485] [INFO] [timer.py:197:stop] 0/8190, RunningAvgSamplesPerSec=6.338958504561117, CurrSamplesPerSec=5.73846210545695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:05:31,725] [INFO] [timer.py:197:stop] 0/8192, RunningAvgSamplesPerSec=6.338965343487257, CurrSamplesPerSec=5.738406902861988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:05:42,966] [INFO] [timer.py:197:stop] 0/8194, RunningAvgSamplesPerSec=6.33897323043952, CurrSamplesPerSec=5.73368133007938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:05:54,353] [INFO] [timer.py:197:stop] 0/8196, RunningAvgSamplesPerSec=6.338978358488804, CurrSamplesPerSec=5.716910594302126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:06:05,609] [INFO] [timer.py:197:stop] 0/8198, RunningAvgSamplesPerSec=6.338984368170587, CurrSamplesPerSec=5.726013263824789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:06:16,838] [INFO] [logging.py:68:log_dist] [Rank 0] step=4100, skipped=5, lr=[2.0133333333333337e-06], mom=[[0.9, 0.999]] [2022-12-17 11:06:16,839] [INFO] [timer.py:197:stop] 0/8200, RunningAvgSamplesPerSec=6.338994041470853, CurrSamplesPerSec=5.74776994998559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.0133333333333337e-06, 'epoch': 17.37} [2022-12-17 11:06:28,044] [INFO] [timer.py:197:stop] 0/8202, RunningAvgSamplesPerSec=6.339006186191332, CurrSamplesPerSec=5.744074263552449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:06:39,300] [INFO] [timer.py:197:stop] 0/8204, RunningAvgSamplesPerSec=6.339013783921928, CurrSamplesPerSec=5.754720584334838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:06:50,550] [INFO] [timer.py:197:stop] 0/8206, RunningAvgSamplesPerSec=6.339022799293814, CurrSamplesPerSec=5.765227145289706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:07:01,784] [INFO] [timer.py:197:stop] 0/8208, RunningAvgSamplesPerSec=6.339034291148339, CurrSamplesPerSec=5.757240403557556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:07:13,076] [INFO] [timer.py:197:stop] 0/8210, RunningAvgSamplesPerSec=6.339035110256033, CurrSamplesPerSec=5.716284604241528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:07:24,329] [INFO] [timer.py:197:stop] 0/8212, RunningAvgSamplesPerSec=6.339041049331456, CurrSamplesPerSec=5.71850577200943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:07:35,570] [INFO] [timer.py:197:stop] 0/8214, RunningAvgSamplesPerSec=6.339050077138531, CurrSamplesPerSec=5.737041896588219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:07:46,813] [INFO] [timer.py:197:stop] 0/8216, RunningAvgSamplesPerSec=6.339055662711915, CurrSamplesPerSec=5.7142352089188195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:07:58,065] [INFO] [timer.py:197:stop] 0/8218, RunningAvgSamplesPerSec=6.339061990192898, CurrSamplesPerSec=5.724786977121834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:08:09,376] [INFO] [logging.py:68:log_dist] [Rank 0] step=4110, skipped=5, lr=[1.9911111111111113e-06], mom=[[0.9, 0.999]] [2022-12-17 11:08:09,379] [INFO] [timer.py:197:stop] 0/8220, RunningAvgSamplesPerSec=6.339059931735823, CurrSamplesPerSec=5.666828021862285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:08:20,617] [INFO] [timer.py:197:stop] 0/8222, RunningAvgSamplesPerSec=6.339068523169324, CurrSamplesPerSec=5.748742382423336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:08:31,901] [INFO] [timer.py:197:stop] 0/8224, RunningAvgSamplesPerSec=6.339071114500694, CurrSamplesPerSec=5.707910612487633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:08:43,166] [INFO] [timer.py:197:stop] 0/8226, RunningAvgSamplesPerSec=6.339076373811075, CurrSamplesPerSec=5.712450091825509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:08:54,414] [INFO] [timer.py:197:stop] 0/8228, RunningAvgSamplesPerSec=6.339083077897917, CurrSamplesPerSec=5.72001334948079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:09:05,654] [INFO] [timer.py:197:stop] 0/8230, RunningAvgSamplesPerSec=6.339091960303741, CurrSamplesPerSec=5.7456059304629505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:09:16,998] [INFO] [timer.py:197:stop] 0/8232, RunningAvgSamplesPerSec=6.3390833115626615, CurrSamplesPerSec=5.637741229429793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:09:28,254] [INFO] [timer.py:197:stop] 0/8234, RunningAvgSamplesPerSec=6.339088383880457, CurrSamplesPerSec=5.717522351409421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:09:39,502] [INFO] [timer.py:197:stop] 0/8236, RunningAvgSamplesPerSec=6.3390943973054705, CurrSamplesPerSec=5.717494342153484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:09:50,792] [INFO] [timer.py:197:stop] 0/8238, RunningAvgSamplesPerSec=6.3390989670440625, CurrSamplesPerSec=5.703746020494652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:10:02,085] [INFO] [logging.py:68:log_dist] [Rank 0] step=4120, skipped=5, lr=[1.968888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 11:10:02,086] [INFO] [timer.py:197:stop] 0/8240, RunningAvgSamplesPerSec=6.3390987022636445, CurrSamplesPerSec=5.698852295748182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:10:13,351] [INFO] [timer.py:197:stop] 0/8242, RunningAvgSamplesPerSec=6.339103412629639, CurrSamplesPerSec=5.716625460863346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:10:24,583] [INFO] [timer.py:197:stop] 0/8244, RunningAvgSamplesPerSec=6.339112582276825, CurrSamplesPerSec=5.723448943387044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:10:35,845] [INFO] [timer.py:197:stop] 0/8246, RunningAvgSamplesPerSec=6.339117191533311, CurrSamplesPerSec=5.7132306393035375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:10:47,096] [INFO] [timer.py:197:stop] 0/8248, RunningAvgSamplesPerSec=6.339120428724326, CurrSamplesPerSec=5.703754019298071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:10:58,338] [INFO] [timer.py:197:stop] 0/8250, RunningAvgSamplesPerSec=6.339127459602533, CurrSamplesPerSec=5.727030398279421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.9577777777777777e-06, 'epoch': 17.48} [2022-12-17 11:11:09,580] [INFO] [timer.py:197:stop] 0/8252, RunningAvgSamplesPerSec=6.339135984734979, CurrSamplesPerSec=5.71758251134771, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:11:20,882] [INFO] [timer.py:197:stop] 0/8254, RunningAvgSamplesPerSec=6.339141798644105, CurrSamplesPerSec=5.715823295197673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:11:32,199] [INFO] [timer.py:197:stop] 0/8256, RunningAvgSamplesPerSec=6.339145620009977, CurrSamplesPerSec=5.708334228243922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:11:43,485] [INFO] [timer.py:197:stop] 0/8258, RunningAvgSamplesPerSec=6.339152792660803, CurrSamplesPerSec=5.72381237944853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:11:54,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=4130, skipped=5, lr=[1.9466666666666665e-06], mom=[[0.9, 0.999]] [2022-12-17 11:11:54,976] [INFO] [timer.py:197:stop] 0/8260, RunningAvgSamplesPerSec=6.339150139457296, CurrSamplesPerSec=5.680538173020475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:12:06,285] [INFO] [timer.py:197:stop] 0/8262, RunningAvgSamplesPerSec=6.339155182167034, CurrSamplesPerSec=5.733205943686423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:12:17,576] [INFO] [timer.py:197:stop] 0/8264, RunningAvgSamplesPerSec=6.3391629960552365, CurrSamplesPerSec=5.724178547068688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:12:28,895] [INFO] [timer.py:197:stop] 0/8266, RunningAvgSamplesPerSec=6.339167342913138, CurrSamplesPerSec=5.713661369116393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:12:40,176] [INFO] [timer.py:197:stop] 0/8268, RunningAvgSamplesPerSec=6.339176790809956, CurrSamplesPerSec=5.727440970450266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:12:51,495] [INFO] [timer.py:197:stop] 0/8270, RunningAvgSamplesPerSec=6.339181293898531, CurrSamplesPerSec=5.7165577732425215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:13:02,808] [INFO] [timer.py:197:stop] 0/8272, RunningAvgSamplesPerSec=6.339186260828034, CurrSamplesPerSec=5.712118484501903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:13:14,095] [INFO] [timer.py:197:stop] 0/8274, RunningAvgSamplesPerSec=6.339192071129688, CurrSamplesPerSec=5.730050456633666, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:13:25,408] [INFO] [timer.py:197:stop] 0/8276, RunningAvgSamplesPerSec=6.339196894368015, CurrSamplesPerSec=5.720743052793949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:13:36,739] [INFO] [timer.py:197:stop] 0/8278, RunningAvgSamplesPerSec=6.33919919016503, CurrSamplesPerSec=5.720215931365112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:13:48,081] [INFO] [logging.py:68:log_dist] [Rank 0] step=4140, skipped=5, lr=[1.9244444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 11:13:48,082] [INFO] [timer.py:197:stop] 0/8280, RunningAvgSamplesPerSec=6.33920117072856, CurrSamplesPerSec=5.712191901573292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:13:59,391] [INFO] [timer.py:197:stop] 0/8282, RunningAvgSamplesPerSec=6.339206953617565, CurrSamplesPerSec=5.726021813773496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:14:10,634] [INFO] [timer.py:197:stop] 0/8284, RunningAvgSamplesPerSec=6.3392097988073655, CurrSamplesPerSec=5.726001049656643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:14:21,913] [INFO] [timer.py:197:stop] 0/8286, RunningAvgSamplesPerSec=6.339212154446398, CurrSamplesPerSec=5.7209666576913625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:14:33,192] [INFO] [timer.py:197:stop] 0/8288, RunningAvgSamplesPerSec=6.339214232019025, CurrSamplesPerSec=5.711508126228083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:14:44,487] [INFO] [timer.py:197:stop] 0/8290, RunningAvgSamplesPerSec=6.339214198475859, CurrSamplesPerSec=5.6819615706288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:14:55,774] [INFO] [timer.py:197:stop] 0/8292, RunningAvgSamplesPerSec=6.339216904861013, CurrSamplesPerSec=5.723860222690163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:15:07,076] [INFO] [timer.py:197:stop] 0/8294, RunningAvgSamplesPerSec=6.3392155917844235, CurrSamplesPerSec=5.684246653894753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:15:18,387] [INFO] [timer.py:197:stop] 0/8296, RunningAvgSamplesPerSec=6.33921313178604, CurrSamplesPerSec=5.666598102286572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:15:29,673] [INFO] [timer.py:197:stop] 0/8298, RunningAvgSamplesPerSec=6.339215239750418, CurrSamplesPerSec=5.6977292835724915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:15:40,954] [INFO] [logging.py:68:log_dist] [Rank 0] step=4150, skipped=5, lr=[1.9022222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 11:15:40,955] [INFO] [timer.py:197:stop] 0/8300, RunningAvgSamplesPerSec=6.339217975605579, CurrSamplesPerSec=5.693590156972587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.9022222222222222e-06, 'epoch': 17.58} [2022-12-17 11:15:52,248] [INFO] [timer.py:197:stop] 0/8302, RunningAvgSamplesPerSec=6.339218437141733, CurrSamplesPerSec=5.710349266393622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:16:03,530] [INFO] [timer.py:197:stop] 0/8304, RunningAvgSamplesPerSec=6.339219993659909, CurrSamplesPerSec=5.706298049388331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:16:14,856] [INFO] [timer.py:197:stop] 0/8306, RunningAvgSamplesPerSec=6.339216303753219, CurrSamplesPerSec=5.689630932039807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:16:26,150] [INFO] [timer.py:197:stop] 0/8308, RunningAvgSamplesPerSec=6.339218495615838, CurrSamplesPerSec=5.706676780294752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:16:37,440] [INFO] [timer.py:197:stop] 0/8310, RunningAvgSamplesPerSec=6.339219412467867, CurrSamplesPerSec=5.721004699086555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:16:48,713] [INFO] [timer.py:197:stop] 0/8312, RunningAvgSamplesPerSec=6.33922233962562, CurrSamplesPerSec=5.711818272076358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:16:59,959] [INFO] [timer.py:197:stop] 0/8314, RunningAvgSamplesPerSec=6.339227335618701, CurrSamplesPerSec=5.726106581789273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:17:11,236] [INFO] [timer.py:197:stop] 0/8316, RunningAvgSamplesPerSec=6.339230585670635, CurrSamplesPerSec=5.717236669874342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:17:22,486] [INFO] [timer.py:197:stop] 0/8318, RunningAvgSamplesPerSec=6.339234875250686, CurrSamplesPerSec=5.723407208553411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:17:33,817] [INFO] [logging.py:68:log_dist] [Rank 0] step=4160, skipped=5, lr=[1.8800000000000002e-06], mom=[[0.9, 0.999]] [2022-12-17 11:17:33,818] [INFO] [timer.py:197:stop] 0/8320, RunningAvgSamplesPerSec=6.339230314931957, CurrSamplesPerSec=5.68304252094583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:17:45,121] [INFO] [timer.py:197:stop] 0/8322, RunningAvgSamplesPerSec=6.33922837184202, CurrSamplesPerSec=5.679600933362707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:17:56,405] [INFO] [timer.py:197:stop] 0/8324, RunningAvgSamplesPerSec=6.339230634670554, CurrSamplesPerSec=5.709936282634974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:18:07,697] [INFO] [timer.py:197:stop] 0/8326, RunningAvgSamplesPerSec=6.339231705867129, CurrSamplesPerSec=5.701697369069446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:18:19,024] [INFO] [timer.py:197:stop] 0/8328, RunningAvgSamplesPerSec=6.339227192482196, CurrSamplesPerSec=5.668277111901843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:18:30,339] [INFO] [timer.py:197:stop] 0/8330, RunningAvgSamplesPerSec=6.339224302880751, CurrSamplesPerSec=5.6813662973292764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:18:41,624] [INFO] [timer.py:197:stop] 0/8332, RunningAvgSamplesPerSec=6.33922615975293, CurrSamplesPerSec=5.70432708286264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:18:52,934] [INFO] [timer.py:197:stop] 0/8334, RunningAvgSamplesPerSec=6.339225213910152, CurrSamplesPerSec=5.689743810942323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:19:04,202] [INFO] [timer.py:197:stop] 0/8336, RunningAvgSamplesPerSec=6.339226439350858, CurrSamplesPerSec=5.698207513618382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:19:15,474] [INFO] [timer.py:197:stop] 0/8338, RunningAvgSamplesPerSec=6.339230754324665, CurrSamplesPerSec=5.721747827024824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:19:26,733] [INFO] [logging.py:68:log_dist] [Rank 0] step=4170, skipped=5, lr=[1.8577777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 11:19:26,735] [INFO] [timer.py:197:stop] 0/8340, RunningAvgSamplesPerSec=6.339235738048293, CurrSamplesPerSec=5.710465884387955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:19:37,967] [INFO] [timer.py:197:stop] 0/8342, RunningAvgSamplesPerSec=6.339242698324499, CurrSamplesPerSec=5.721313194470139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:19:49,272] [INFO] [timer.py:197:stop] 0/8344, RunningAvgSamplesPerSec=6.3392414822520005, CurrSamplesPerSec=5.701325595506997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:20:00,594] [INFO] [timer.py:197:stop] 0/8346, RunningAvgSamplesPerSec=6.339237516286787, CurrSamplesPerSec=5.672403475904154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:20:11,900] [INFO] [timer.py:197:stop] 0/8348, RunningAvgSamplesPerSec=6.339236875028155, CurrSamplesPerSec=5.685905548740478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:20:23,180] [INFO] [timer.py:197:stop] 0/8350, RunningAvgSamplesPerSec=6.339236993106727, CurrSamplesPerSec=5.7164289764054255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.8466666666666668e-06, 'epoch': 17.69} [2022-12-17 11:20:34,486] [INFO] [timer.py:197:stop] 0/8352, RunningAvgSamplesPerSec=6.339234923905659, CurrSamplesPerSec=5.683368594923178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:20:45,748] [INFO] [timer.py:197:stop] 0/8354, RunningAvgSamplesPerSec=6.339237401598272, CurrSamplesPerSec=5.708726826885181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:20:57,054] [INFO] [timer.py:197:stop] 0/8356, RunningAvgSamplesPerSec=6.339235782079853, CurrSamplesPerSec=5.686971373779364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:21:08,516] [INFO] [timer.py:197:stop] 0/8358, RunningAvgSamplesPerSec=6.339236342324881, CurrSamplesPerSec=5.697629632182958, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:21:19,799] [INFO] [logging.py:68:log_dist] [Rank 0] step=4180, skipped=5, lr=[1.8355555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 11:21:19,801] [INFO] [timer.py:197:stop] 0/8360, RunningAvgSamplesPerSec=6.339237709966899, CurrSamplesPerSec=5.7126971205425034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:21:31,034] [INFO] [timer.py:197:stop] 0/8362, RunningAvgSamplesPerSec=6.339244578809194, CurrSamplesPerSec=5.731835822103387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:21:42,314] [INFO] [timer.py:197:stop] 0/8364, RunningAvgSamplesPerSec=6.3392467301292434, CurrSamplesPerSec=5.7207367131062075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:21:53,630] [INFO] [timer.py:197:stop] 0/8366, RunningAvgSamplesPerSec=6.339244331933694, CurrSamplesPerSec=5.684867813361724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:22:04,874] [INFO] [timer.py:197:stop] 0/8368, RunningAvgSamplesPerSec=6.339252480676972, CurrSamplesPerSec=5.736243797650853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:22:16,102] [INFO] [timer.py:197:stop] 0/8370, RunningAvgSamplesPerSec=6.339260552374997, CurrSamplesPerSec=5.731778054383289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:22:27,417] [INFO] [timer.py:197:stop] 0/8372, RunningAvgSamplesPerSec=6.339258189466109, CurrSamplesPerSec=5.665809435703773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:22:38,695] [INFO] [timer.py:197:stop] 0/8374, RunningAvgSamplesPerSec=6.339260729551199, CurrSamplesPerSec=5.712891646378365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:22:49,975] [INFO] [timer.py:197:stop] 0/8376, RunningAvgSamplesPerSec=6.339262395899968, CurrSamplesPerSec=5.718768919349549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:23:01,268] [INFO] [timer.py:197:stop] 0/8378, RunningAvgSamplesPerSec=6.3392628381798355, CurrSamplesPerSec=5.6697381999902845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:23:12,560] [INFO] [logging.py:68:log_dist] [Rank 0] step=4190, skipped=5, lr=[1.8133333333333337e-06], mom=[[0.9, 0.999]] [2022-12-17 11:23:12,562] [INFO] [timer.py:197:stop] 0/8380, RunningAvgSamplesPerSec=6.339263209916691, CurrSamplesPerSec=5.7151639594689145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:23:23,807] [INFO] [timer.py:197:stop] 0/8382, RunningAvgSamplesPerSec=6.339268664321199, CurrSamplesPerSec=5.725624390293013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:23:35,113] [INFO] [timer.py:197:stop] 0/8384, RunningAvgSamplesPerSec=6.3392655904391715, CurrSamplesPerSec=5.6802217990221235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:23:46,412] [INFO] [timer.py:197:stop] 0/8386, RunningAvgSamplesPerSec=6.339265396750826, CurrSamplesPerSec=5.690623117243999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:23:57,717] [INFO] [timer.py:197:stop] 0/8388, RunningAvgSamplesPerSec=6.339263619268187, CurrSamplesPerSec=5.707007270753393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:24:09,014] [INFO] [timer.py:197:stop] 0/8390, RunningAvgSamplesPerSec=6.339262406010603, CurrSamplesPerSec=5.702546211715137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:24:20,315] [INFO] [timer.py:197:stop] 0/8392, RunningAvgSamplesPerSec=6.33926228919691, CurrSamplesPerSec=5.7045947459036475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:24:31,598] [INFO] [timer.py:197:stop] 0/8394, RunningAvgSamplesPerSec=6.339264393046411, CurrSamplesPerSec=5.722754174297952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:24:42,857] [INFO] [timer.py:197:stop] 0/8396, RunningAvgSamplesPerSec=6.339268603723656, CurrSamplesPerSec=5.736055522811934, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:24:54,138] [INFO] [timer.py:197:stop] 0/8398, RunningAvgSamplesPerSec=6.339271480524336, CurrSamplesPerSec=5.70896673439108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:25:05,439] [INFO] [logging.py:68:log_dist] [Rank 0] step=4200, skipped=5, lr=[1.7911111111111113e-06], mom=[[0.9, 0.999]] [2022-12-17 11:25:05,441] [INFO] [timer.py:197:stop] 0/8400, RunningAvgSamplesPerSec=6.339271660929598, CurrSamplesPerSec=5.726014729528468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.7911111111111113e-06, 'epoch': 17.8} [2022-12-17 11:25:16,720] [INFO] [timer.py:197:stop] 0/8402, RunningAvgSamplesPerSec=6.339274754134736, CurrSamplesPerSec=5.722479924827972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:25:27,987] [INFO] [timer.py:197:stop] 0/8404, RunningAvgSamplesPerSec=6.339277088608752, CurrSamplesPerSec=5.712859062377203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:25:39,244] [INFO] [timer.py:197:stop] 0/8406, RunningAvgSamplesPerSec=6.3392782652594875, CurrSamplesPerSec=5.707171802301107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:25:50,788] [INFO] [timer.py:197:stop] 0/8408, RunningAvgSamplesPerSec=6.339279890790081, CurrSamplesPerSec=5.708733625605707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:26:02,078] [INFO] [timer.py:197:stop] 0/8410, RunningAvgSamplesPerSec=6.339278027613123, CurrSamplesPerSec=5.705521335214915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:26:13,336] [INFO] [timer.py:197:stop] 0/8412, RunningAvgSamplesPerSec=6.339281095125638, CurrSamplesPerSec=5.702650396460303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:26:24,594] [INFO] [timer.py:197:stop] 0/8414, RunningAvgSamplesPerSec=6.339285152925692, CurrSamplesPerSec=5.705124084723759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:26:35,852] [INFO] [timer.py:197:stop] 0/8416, RunningAvgSamplesPerSec=6.339290260592174, CurrSamplesPerSec=5.736132253122681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:26:47,107] [INFO] [timer.py:197:stop] 0/8418, RunningAvgSamplesPerSec=6.339293490660097, CurrSamplesPerSec=5.707359641948332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:26:58,378] [INFO] [logging.py:68:log_dist] [Rank 0] step=4210, skipped=5, lr=[1.7688888888888891e-06], mom=[[0.9, 0.999]] [2022-12-17 11:26:58,380] [INFO] [timer.py:197:stop] 0/8420, RunningAvgSamplesPerSec=6.339294665055505, CurrSamplesPerSec=5.723917342709765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:27:09,669] [INFO] [timer.py:197:stop] 0/8422, RunningAvgSamplesPerSec=6.339296075010931, CurrSamplesPerSec=5.711223045375077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:27:20,912] [INFO] [timer.py:197:stop] 0/8424, RunningAvgSamplesPerSec=6.33930149368414, CurrSamplesPerSec=5.715657047447357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:27:32,217] [INFO] [timer.py:197:stop] 0/8426, RunningAvgSamplesPerSec=6.339302209541907, CurrSamplesPerSec=5.707435121051169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:27:43,494] [INFO] [timer.py:197:stop] 0/8428, RunningAvgSamplesPerSec=6.33930898862225, CurrSamplesPerSec=5.732467181689143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:27:54,732] [INFO] [timer.py:197:stop] 0/8430, RunningAvgSamplesPerSec=6.339317828331498, CurrSamplesPerSec=5.736133969159488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:28:05,946] [INFO] [timer.py:197:stop] 0/8432, RunningAvgSamplesPerSec=6.339327271371329, CurrSamplesPerSec=5.738240320136126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:28:17,192] [INFO] [timer.py:197:stop] 0/8434, RunningAvgSamplesPerSec=6.339334200311701, CurrSamplesPerSec=5.73747867698906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:28:28,434] [INFO] [timer.py:197:stop] 0/8436, RunningAvgSamplesPerSec=6.339342476771847, CurrSamplesPerSec=5.727931534738645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:28:39,663] [INFO] [timer.py:197:stop] 0/8438, RunningAvgSamplesPerSec=6.3393491244231255, CurrSamplesPerSec=5.7417042369934315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:28:50,891] [INFO] [logging.py:68:log_dist] [Rank 0] step=4220, skipped=5, lr=[1.7466666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 11:28:50,893] [INFO] [timer.py:197:stop] 0/8440, RunningAvgSamplesPerSec=6.339356976835649, CurrSamplesPerSec=5.740567711638473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:29:02,107] [INFO] [timer.py:197:stop] 0/8442, RunningAvgSamplesPerSec=6.339367479605312, CurrSamplesPerSec=5.755491501651248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:29:13,377] [INFO] [timer.py:197:stop] 0/8444, RunningAvgSamplesPerSec=6.33937230949232, CurrSamplesPerSec=5.708431098214918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:29:24,659] [INFO] [timer.py:197:stop] 0/8446, RunningAvgSamplesPerSec=6.339376247273105, CurrSamplesPerSec=5.726508225755047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:29:35,917] [INFO] [timer.py:197:stop] 0/8448, RunningAvgSamplesPerSec=6.339383441457143, CurrSamplesPerSec=5.730865185284245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:29:47,208] [INFO] [timer.py:197:stop] 0/8450, RunningAvgSamplesPerSec=6.3393828566553365, CurrSamplesPerSec=5.689957038867744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.7355555555555555e-06, 'epoch': 17.9} [2022-12-17 11:29:58,637] [INFO] [timer.py:197:stop] 0/8452, RunningAvgSamplesPerSec=6.339384996625631, CurrSamplesPerSec=5.714973902369629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:30:09,875] [INFO] [timer.py:197:stop] 0/8454, RunningAvgSamplesPerSec=6.339388946893833, CurrSamplesPerSec=5.712536646700399, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:30:21,138] [INFO] [timer.py:197:stop] 0/8456, RunningAvgSamplesPerSec=6.339391369493708, CurrSamplesPerSec=5.715111881106288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:30:32,468] [INFO] [timer.py:197:stop] 0/8458, RunningAvgSamplesPerSec=6.339383985489258, CurrSamplesPerSec=5.647728124824058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:30:43,743] [INFO] [logging.py:68:log_dist] [Rank 0] step=4230, skipped=5, lr=[1.7244444444444448e-06], mom=[[0.9, 0.999]] [2022-12-17 11:30:43,745] [INFO] [timer.py:197:stop] 0/8460, RunningAvgSamplesPerSec=6.339386521467177, CurrSamplesPerSec=5.718073337086938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:30:55,006] [INFO] [timer.py:197:stop] 0/8462, RunningAvgSamplesPerSec=6.339391990345543, CurrSamplesPerSec=5.7277765591619945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:31:06,220] [INFO] [timer.py:197:stop] 0/8464, RunningAvgSamplesPerSec=6.339402510514149, CurrSamplesPerSec=5.751242181357707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:31:17,439] [INFO] [timer.py:197:stop] 0/8466, RunningAvgSamplesPerSec=6.3394110532341195, CurrSamplesPerSec=5.73423371982176, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:31:28,698] [INFO] [timer.py:197:stop] 0/8468, RunningAvgSamplesPerSec=6.3394181096671325, CurrSamplesPerSec=5.723654941774408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:31:40,004] [INFO] [timer.py:197:stop] 0/8470, RunningAvgSamplesPerSec=6.339416508958474, CurrSamplesPerSec=5.6938954612453685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:31:51,293] [INFO] [timer.py:197:stop] 0/8472, RunningAvgSamplesPerSec=6.3394182299776185, CurrSamplesPerSec=5.708501021440033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:32:02,565] [INFO] [timer.py:197:stop] 0/8474, RunningAvgSamplesPerSec=6.339421522948309, CurrSamplesPerSec=5.715124292218715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:32:13,874] [INFO] [timer.py:197:stop] 0/8476, RunningAvgSamplesPerSec=6.339421159949795, CurrSamplesPerSec=5.7121568945223995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:32:25,178] [INFO] [timer.py:197:stop] 0/8478, RunningAvgSamplesPerSec=6.339421078885325, CurrSamplesPerSec=5.707427354620473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:32:36,476] [INFO] [logging.py:68:log_dist] [Rank 0] step=4240, skipped=5, lr=[1.7022222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 11:32:36,477] [INFO] [timer.py:197:stop] 0/8480, RunningAvgSamplesPerSec=6.3394213048783445, CurrSamplesPerSec=5.707564969195127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:32:47,776] [INFO] [timer.py:197:stop] 0/8482, RunningAvgSamplesPerSec=6.3394211054246545, CurrSamplesPerSec=5.691500040539161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:32:59,061] [INFO] [timer.py:197:stop] 0/8484, RunningAvgSamplesPerSec=6.339421151126853, CurrSamplesPerSec=5.707390464363309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:33:10,407] [INFO] [timer.py:197:stop] 0/8486, RunningAvgSamplesPerSec=6.339413566362518, CurrSamplesPerSec=5.662013657402974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:33:21,713] [INFO] [timer.py:197:stop] 0/8488, RunningAvgSamplesPerSec=6.339409833852926, CurrSamplesPerSec=5.668538529567534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:33:33,028] [INFO] [timer.py:197:stop] 0/8490, RunningAvgSamplesPerSec=6.339406983036536, CurrSamplesPerSec=5.700460894864342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:33:44,288] [INFO] [timer.py:197:stop] 0/8492, RunningAvgSamplesPerSec=6.339413427087992, CurrSamplesPerSec=5.715729825287001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:33:55,568] [INFO] [timer.py:197:stop] 0/8494, RunningAvgSamplesPerSec=6.339413990957854, CurrSamplesPerSec=5.6974368698493985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:34:04,037] [INFO] [timer.py:197:stop] 0/8496, RunningAvgSamplesPerSec=6.339782427831466, CurrSamplesPerSec=10.235213956730592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:34:15,340] [INFO] [timer.py:197:stop] 0/8498, RunningAvgSamplesPerSec=6.339781535666853, CurrSamplesPerSec=5.692628565900289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:34:26,644] [INFO] [logging.py:68:log_dist] [Rank 0] step=4250, skipped=5, lr=[1.6800000000000002e-06], mom=[[0.9, 0.999]] [2022-12-17 11:34:26,646] [INFO] [timer.py:197:stop] 0/8500, RunningAvgSamplesPerSec=6.339783631474415, CurrSamplesPerSec=5.701057270515074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.6800000000000002e-06, 'epoch': 18.01} [2022-12-17 11:34:37,949] [INFO] [timer.py:197:stop] 0/8502, RunningAvgSamplesPerSec=6.339780847211561, CurrSamplesPerSec=5.690072825431826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:34:49,278] [INFO] [timer.py:197:stop] 0/8504, RunningAvgSamplesPerSec=6.339777636169857, CurrSamplesPerSec=5.674001973632397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:35:00,542] [INFO] [timer.py:197:stop] 0/8506, RunningAvgSamplesPerSec=6.339781747344228, CurrSamplesPerSec=5.719508054321843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:35:11,842] [INFO] [timer.py:197:stop] 0/8508, RunningAvgSamplesPerSec=6.339781047907795, CurrSamplesPerSec=5.708254598248733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:35:23,139] [INFO] [timer.py:197:stop] 0/8510, RunningAvgSamplesPerSec=6.339781943068733, CurrSamplesPerSec=5.712835232522119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:35:34,423] [INFO] [timer.py:197:stop] 0/8512, RunningAvgSamplesPerSec=6.3397839002766805, CurrSamplesPerSec=5.713114881098626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:35:45,681] [INFO] [timer.py:197:stop] 0/8514, RunningAvgSamplesPerSec=6.3397893828611664, CurrSamplesPerSec=5.719634065067116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:35:56,965] [INFO] [timer.py:197:stop] 0/8516, RunningAvgSamplesPerSec=6.339792945786711, CurrSamplesPerSec=5.705632662583526, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:36:08,255] [INFO] [timer.py:197:stop] 0/8518, RunningAvgSamplesPerSec=6.339795762011441, CurrSamplesPerSec=5.709598894951428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:36:19,518] [INFO] [logging.py:68:log_dist] [Rank 0] step=4260, skipped=5, lr=[1.6577777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 11:36:19,520] [INFO] [timer.py:197:stop] 0/8520, RunningAvgSamplesPerSec=6.339800319404483, CurrSamplesPerSec=5.715123318796184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:36:30,824] [INFO] [timer.py:197:stop] 0/8522, RunningAvgSamplesPerSec=6.339802019617752, CurrSamplesPerSec=5.689481157333476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:36:42,124] [INFO] [timer.py:197:stop] 0/8524, RunningAvgSamplesPerSec=6.33980181371531, CurrSamplesPerSec=5.68836207618135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:36:53,440] [INFO] [timer.py:197:stop] 0/8526, RunningAvgSamplesPerSec=6.339799072181414, CurrSamplesPerSec=5.693480023593855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:37:04,800] [INFO] [timer.py:197:stop] 0/8528, RunningAvgSamplesPerSec=6.339796757419858, CurrSamplesPerSec=5.685700331747718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:37:16,230] [INFO] [timer.py:197:stop] 0/8530, RunningAvgSamplesPerSec=6.339793128090444, CurrSamplesPerSec=5.679584830639335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:37:27,507] [INFO] [timer.py:197:stop] 0/8532, RunningAvgSamplesPerSec=6.3397911491863415, CurrSamplesPerSec=5.702511080508184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:37:38,789] [INFO] [timer.py:197:stop] 0/8534, RunningAvgSamplesPerSec=6.339793107232789, CurrSamplesPerSec=5.71491842073463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:37:50,118] [INFO] [timer.py:197:stop] 0/8536, RunningAvgSamplesPerSec=6.339791267095922, CurrSamplesPerSec=5.678680580952989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:38:01,439] [INFO] [timer.py:197:stop] 0/8538, RunningAvgSamplesPerSec=6.339787941149314, CurrSamplesPerSec=5.680405705150381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:38:12,731] [INFO] [logging.py:68:log_dist] [Rank 0] step=4270, skipped=5, lr=[1.6355555555555559e-06], mom=[[0.9, 0.999]] [2022-12-17 11:38:12,733] [INFO] [timer.py:197:stop] 0/8540, RunningAvgSamplesPerSec=6.339789678550872, CurrSamplesPerSec=5.7091913624282995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:38:24,019] [INFO] [timer.py:197:stop] 0/8542, RunningAvgSamplesPerSec=6.339791034186154, CurrSamplesPerSec=5.71031428192412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:38:35,314] [INFO] [timer.py:197:stop] 0/8544, RunningAvgSamplesPerSec=6.33978949614059, CurrSamplesPerSec=5.704379207300573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:38:46,622] [INFO] [timer.py:197:stop] 0/8546, RunningAvgSamplesPerSec=6.339787753883054, CurrSamplesPerSec=5.6868624601278395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:38:57,914] [INFO] [timer.py:197:stop] 0/8548, RunningAvgSamplesPerSec=6.339788021027447, CurrSamplesPerSec=5.695277469135737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:39:09,212] [INFO] [timer.py:197:stop] 0/8550, RunningAvgSamplesPerSec=6.339785911981671, CurrSamplesPerSec=5.690164493137114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.6244444444444447e-06, 'epoch': 18.11} [2022-12-17 11:39:20,568] [INFO] [timer.py:197:stop] 0/8552, RunningAvgSamplesPerSec=6.339775247921724, CurrSamplesPerSec=5.640723061991974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:39:31,886] [INFO] [timer.py:197:stop] 0/8554, RunningAvgSamplesPerSec=6.339772199800058, CurrSamplesPerSec=5.696801112656467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:39:43,155] [INFO] [timer.py:197:stop] 0/8556, RunningAvgSamplesPerSec=6.339773800987133, CurrSamplesPerSec=5.713415959472967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:39:54,427] [INFO] [timer.py:197:stop] 0/8558, RunningAvgSamplesPerSec=6.339776201349781, CurrSamplesPerSec=5.694955826642644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:40:05,685] [INFO] [logging.py:68:log_dist] [Rank 0] step=4280, skipped=5, lr=[1.6133333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 11:40:05,687] [INFO] [timer.py:197:stop] 0/8560, RunningAvgSamplesPerSec=6.33978132883249, CurrSamplesPerSec=5.7272786896522385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:40:16,991] [INFO] [timer.py:197:stop] 0/8562, RunningAvgSamplesPerSec=6.33977990750635, CurrSamplesPerSec=5.680848811566373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:40:28,248] [INFO] [timer.py:197:stop] 0/8564, RunningAvgSamplesPerSec=6.339785807700449, CurrSamplesPerSec=5.744079671759363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:40:39,502] [INFO] [timer.py:197:stop] 0/8566, RunningAvgSamplesPerSec=6.339791741271043, CurrSamplesPerSec=5.73388757579416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:40:50,820] [INFO] [timer.py:197:stop] 0/8568, RunningAvgSamplesPerSec=6.339788840323985, CurrSamplesPerSec=5.684582496517594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:41:02,131] [INFO] [timer.py:197:stop] 0/8570, RunningAvgSamplesPerSec=6.33978424473613, CurrSamplesPerSec=5.664017154219129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:41:13,415] [INFO] [timer.py:197:stop] 0/8572, RunningAvgSamplesPerSec=6.339786092626063, CurrSamplesPerSec=5.713152088581492, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:41:24,640] [INFO] [timer.py:197:stop] 0/8574, RunningAvgSamplesPerSec=6.339793841673661, CurrSamplesPerSec=5.73153867344533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:41:35,929] [INFO] [timer.py:197:stop] 0/8576, RunningAvgSamplesPerSec=6.339795662229369, CurrSamplesPerSec=5.677744194300416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:41:47,173] [INFO] [timer.py:197:stop] 0/8578, RunningAvgSamplesPerSec=6.3398015179993505, CurrSamplesPerSec=5.725435590723989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:41:58,430] [INFO] [logging.py:68:log_dist] [Rank 0] step=4290, skipped=5, lr=[1.5911111111111113e-06], mom=[[0.9, 0.999]] [2022-12-17 11:41:58,432] [INFO] [timer.py:197:stop] 0/8580, RunningAvgSamplesPerSec=6.339807066560084, CurrSamplesPerSec=5.726189397922931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:42:09,769] [INFO] [timer.py:197:stop] 0/8582, RunningAvgSamplesPerSec=6.339806762657493, CurrSamplesPerSec=5.7034345693817015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:42:21,079] [INFO] [timer.py:197:stop] 0/8584, RunningAvgSamplesPerSec=6.339805077050516, CurrSamplesPerSec=5.69950980922965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:42:32,387] [INFO] [timer.py:197:stop] 0/8586, RunningAvgSamplesPerSec=6.339804042820197, CurrSamplesPerSec=5.706086263438248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:42:43,713] [INFO] [timer.py:197:stop] 0/8588, RunningAvgSamplesPerSec=6.339797184392537, CurrSamplesPerSec=5.683306264975196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:42:55,069] [INFO] [timer.py:197:stop] 0/8590, RunningAvgSamplesPerSec=6.339789078503823, CurrSamplesPerSec=5.658868508175669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:43:06,329] [INFO] [timer.py:197:stop] 0/8592, RunningAvgSamplesPerSec=6.339791742272315, CurrSamplesPerSec=5.724263992921275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:43:17,617] [INFO] [timer.py:197:stop] 0/8594, RunningAvgSamplesPerSec=6.339790736510714, CurrSamplesPerSec=5.696077503953731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:43:28,921] [INFO] [timer.py:197:stop] 0/8596, RunningAvgSamplesPerSec=6.339790044616391, CurrSamplesPerSec=5.703937997968867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:43:40,242] [INFO] [timer.py:197:stop] 0/8598, RunningAvgSamplesPerSec=6.33978624497434, CurrSamplesPerSec=5.700390684260625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:43:51,546] [INFO] [logging.py:68:log_dist] [Rank 0] step=4300, skipped=5, lr=[1.568888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 11:43:51,548] [INFO] [timer.py:197:stop] 0/8600, RunningAvgSamplesPerSec=6.33978249579551, CurrSamplesPerSec=5.683888465500305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.568888888888889e-06, 'epoch': 18.22} [2022-12-17 11:44:02,891] [INFO] [timer.py:197:stop] 0/8602, RunningAvgSamplesPerSec=6.339775170207083, CurrSamplesPerSec=5.671931724371231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:44:14,204] [INFO] [timer.py:197:stop] 0/8604, RunningAvgSamplesPerSec=6.33977109989562, CurrSamplesPerSec=5.672009864781847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:44:25,515] [INFO] [timer.py:197:stop] 0/8606, RunningAvgSamplesPerSec=6.339769934462846, CurrSamplesPerSec=5.690476909212335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:44:36,821] [INFO] [timer.py:197:stop] 0/8608, RunningAvgSamplesPerSec=6.339769412472456, CurrSamplesPerSec=5.697288618609522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:44:48,111] [INFO] [timer.py:197:stop] 0/8610, RunningAvgSamplesPerSec=6.339769140575158, CurrSamplesPerSec=5.71895679212821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:44:59,394] [INFO] [timer.py:197:stop] 0/8612, RunningAvgSamplesPerSec=6.33977065950526, CurrSamplesPerSec=5.725594836110285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:45:10,690] [INFO] [timer.py:197:stop] 0/8614, RunningAvgSamplesPerSec=6.339771673358445, CurrSamplesPerSec=5.722855926560641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:45:21,980] [INFO] [timer.py:197:stop] 0/8616, RunningAvgSamplesPerSec=6.339772835879714, CurrSamplesPerSec=5.707369349759655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:45:33,267] [INFO] [timer.py:197:stop] 0/8618, RunningAvgSamplesPerSec=6.339774552535556, CurrSamplesPerSec=5.7088997136654624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:45:44,531] [INFO] [logging.py:68:log_dist] [Rank 0] step=4310, skipped=5, lr=[1.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 11:45:44,533] [INFO] [timer.py:197:stop] 0/8620, RunningAvgSamplesPerSec=6.339776481679606, CurrSamplesPerSec=5.712077400937806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:45:55,799] [INFO] [timer.py:197:stop] 0/8622, RunningAvgSamplesPerSec=6.339781219744546, CurrSamplesPerSec=5.729563931908303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:46:07,095] [INFO] [timer.py:197:stop] 0/8624, RunningAvgSamplesPerSec=6.339781527658315, CurrSamplesPerSec=5.721834663889172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:46:18,379] [INFO] [timer.py:197:stop] 0/8626, RunningAvgSamplesPerSec=6.339784671445439, CurrSamplesPerSec=5.710062114887612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:46:29,903] [INFO] [timer.py:197:stop] 0/8628, RunningAvgSamplesPerSec=6.339790082020202, CurrSamplesPerSec=5.723102879952324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:46:41,210] [INFO] [timer.py:197:stop] 0/8630, RunningAvgSamplesPerSec=6.339793936597643, CurrSamplesPerSec=5.722113730534273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:46:52,497] [INFO] [timer.py:197:stop] 0/8632, RunningAvgSamplesPerSec=6.339795313337186, CurrSamplesPerSec=5.7167929824432875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:47:03,759] [INFO] [timer.py:197:stop] 0/8634, RunningAvgSamplesPerSec=6.339801308717311, CurrSamplesPerSec=5.731501715593646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:47:15,037] [INFO] [timer.py:197:stop] 0/8636, RunningAvgSamplesPerSec=6.33979993801347, CurrSamplesPerSec=5.697033006125454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:47:26,315] [INFO] [timer.py:197:stop] 0/8638, RunningAvgSamplesPerSec=6.339800368113021, CurrSamplesPerSec=5.720129631190722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:47:37,577] [INFO] [logging.py:68:log_dist] [Rank 0] step=4320, skipped=5, lr=[1.5244444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 11:47:37,578] [INFO] [timer.py:197:stop] 0/8640, RunningAvgSamplesPerSec=6.339805782505671, CurrSamplesPerSec=5.728831976728478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:47:48,858] [INFO] [timer.py:197:stop] 0/8642, RunningAvgSamplesPerSec=6.339809453462374, CurrSamplesPerSec=5.700420705134696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:48:00,106] [INFO] [timer.py:197:stop] 0/8644, RunningAvgSamplesPerSec=6.339812171595236, CurrSamplesPerSec=5.729238405192412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:48:11,360] [INFO] [timer.py:197:stop] 0/8646, RunningAvgSamplesPerSec=6.339814153430132, CurrSamplesPerSec=5.732832008862419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:48:22,610] [INFO] [timer.py:197:stop] 0/8648, RunningAvgSamplesPerSec=6.339821137286618, CurrSamplesPerSec=5.734705356259326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:48:33,932] [INFO] [timer.py:197:stop] 0/8650, RunningAvgSamplesPerSec=6.339816322733475, CurrSamplesPerSec=5.6507754896516476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.5133333333333334e-06, 'epoch': 18.33} [2022-12-17 11:48:45,234] [INFO] [timer.py:197:stop] 0/8652, RunningAvgSamplesPerSec=6.339816083819372, CurrSamplesPerSec=5.699778230623335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:48:56,531] [INFO] [timer.py:197:stop] 0/8654, RunningAvgSamplesPerSec=6.339819380427446, CurrSamplesPerSec=5.698967960659726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:49:07,829] [INFO] [timer.py:197:stop] 0/8656, RunningAvgSamplesPerSec=6.339818811451719, CurrSamplesPerSec=5.692353092011138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:49:19,089] [INFO] [timer.py:197:stop] 0/8658, RunningAvgSamplesPerSec=6.339821531302611, CurrSamplesPerSec=5.7240638095078475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:49:30,321] [INFO] [logging.py:68:log_dist] [Rank 0] step=4330, skipped=5, lr=[1.5022222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 11:49:30,323] [INFO] [timer.py:197:stop] 0/8660, RunningAvgSamplesPerSec=6.339830124316635, CurrSamplesPerSec=5.733025949322644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:49:41,649] [INFO] [timer.py:197:stop] 0/8662, RunningAvgSamplesPerSec=6.33982594014271, CurrSamplesPerSec=5.679973965287332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:49:52,927] [INFO] [timer.py:197:stop] 0/8664, RunningAvgSamplesPerSec=6.339828757213136, CurrSamplesPerSec=5.725590195398564, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:50:04,238] [INFO] [timer.py:197:stop] 0/8666, RunningAvgSamplesPerSec=6.339825051306159, CurrSamplesPerSec=5.679188299844723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:50:15,784] [INFO] [timer.py:197:stop] 0/8668, RunningAvgSamplesPerSec=6.339826278212767, CurrSamplesPerSec=5.722158373965453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:50:27,046] [INFO] [timer.py:197:stop] 0/8670, RunningAvgSamplesPerSec=6.339826491115368, CurrSamplesPerSec=5.697858448570247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:50:38,358] [INFO] [timer.py:197:stop] 0/8672, RunningAvgSamplesPerSec=6.339822180424926, CurrSamplesPerSec=5.675786419696285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:50:49,661] [INFO] [timer.py:197:stop] 0/8674, RunningAvgSamplesPerSec=6.339819656595667, CurrSamplesPerSec=5.699933147171436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:51:00,951] [INFO] [timer.py:197:stop] 0/8676, RunningAvgSamplesPerSec=6.339821439488037, CurrSamplesPerSec=5.710637661650436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:51:12,282] [INFO] [timer.py:197:stop] 0/8678, RunningAvgSamplesPerSec=6.339816397132015, CurrSamplesPerSec=5.663841716955458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:51:23,590] [INFO] [logging.py:68:log_dist] [Rank 0] step=4340, skipped=5, lr=[1.48e-06], mom=[[0.9, 0.999]] [2022-12-17 11:51:23,592] [INFO] [timer.py:197:stop] 0/8680, RunningAvgSamplesPerSec=6.339814517342316, CurrSamplesPerSec=5.6952871358887025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:51:34,886] [INFO] [timer.py:197:stop] 0/8682, RunningAvgSamplesPerSec=6.3398151117840555, CurrSamplesPerSec=5.701887997509513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:51:46,203] [INFO] [timer.py:197:stop] 0/8684, RunningAvgSamplesPerSec=6.339813200596462, CurrSamplesPerSec=5.681629645282856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:51:57,565] [INFO] [timer.py:197:stop] 0/8686, RunningAvgSamplesPerSec=6.3398039213915185, CurrSamplesPerSec=5.721123459773922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:52:08,836] [INFO] [timer.py:197:stop] 0/8688, RunningAvgSamplesPerSec=6.339807339755771, CurrSamplesPerSec=5.733363662323854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:52:20,122] [INFO] [timer.py:197:stop] 0/8690, RunningAvgSamplesPerSec=6.3398047820889225, CurrSamplesPerSec=5.693373275569488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:52:31,419] [INFO] [timer.py:197:stop] 0/8692, RunningAvgSamplesPerSec=6.339805487152986, CurrSamplesPerSec=5.695563860606969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:52:42,699] [INFO] [timer.py:197:stop] 0/8694, RunningAvgSamplesPerSec=6.339805809847566, CurrSamplesPerSec=5.699512229507981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:52:53,996] [INFO] [timer.py:197:stop] 0/8696, RunningAvgSamplesPerSec=6.339806687332358, CurrSamplesPerSec=5.70979588176447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:53:05,318] [INFO] [timer.py:197:stop] 0/8698, RunningAvgSamplesPerSec=6.339802544669305, CurrSamplesPerSec=5.6829250953489625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:53:16,607] [INFO] [logging.py:68:log_dist] [Rank 0] step=4350, skipped=5, lr=[1.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 11:53:16,609] [INFO] [timer.py:197:stop] 0/8700, RunningAvgSamplesPerSec=6.339803640882542, CurrSamplesPerSec=5.697693969817248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.457777777777778e-06, 'epoch': 18.43} [2022-12-17 11:53:27,908] [INFO] [timer.py:197:stop] 0/8702, RunningAvgSamplesPerSec=6.339803013859382, CurrSamplesPerSec=5.698081961389258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:53:39,182] [INFO] [timer.py:197:stop] 0/8704, RunningAvgSamplesPerSec=6.339804079590559, CurrSamplesPerSec=5.723818481858318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:53:50,469] [INFO] [timer.py:197:stop] 0/8706, RunningAvgSamplesPerSec=6.339801296690999, CurrSamplesPerSec=5.673336183153034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:54:01,762] [INFO] [timer.py:197:stop] 0/8708, RunningAvgSamplesPerSec=6.339801617730664, CurrSamplesPerSec=5.713631451847175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:54:13,066] [INFO] [timer.py:197:stop] 0/8710, RunningAvgSamplesPerSec=6.339801457413741, CurrSamplesPerSec=5.6995541006484745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:54:24,345] [INFO] [timer.py:197:stop] 0/8712, RunningAvgSamplesPerSec=6.339804380768808, CurrSamplesPerSec=5.7226387618425285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:54:35,647] [INFO] [timer.py:197:stop] 0/8714, RunningAvgSamplesPerSec=6.339801671989652, CurrSamplesPerSec=5.700162390700976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:54:46,960] [INFO] [timer.py:197:stop] 0/8716, RunningAvgSamplesPerSec=6.339799675445553, CurrSamplesPerSec=5.690776571377547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:54:58,263] [INFO] [timer.py:197:stop] 0/8718, RunningAvgSamplesPerSec=6.33979969029298, CurrSamplesPerSec=5.70129169023307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:55:09,614] [INFO] [logging.py:68:log_dist] [Rank 0] step=4360, skipped=5, lr=[1.4355555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 11:55:09,616] [INFO] [timer.py:197:stop] 0/8720, RunningAvgSamplesPerSec=6.339792131215849, CurrSamplesPerSec=5.65222399744496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:55:20,926] [INFO] [timer.py:197:stop] 0/8722, RunningAvgSamplesPerSec=6.339789881088182, CurrSamplesPerSec=5.686427569989322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:55:32,475] [INFO] [timer.py:197:stop] 0/8724, RunningAvgSamplesPerSec=6.339788342912373, CurrSamplesPerSec=5.698039628069896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:55:43,764] [INFO] [timer.py:197:stop] 0/8726, RunningAvgSamplesPerSec=6.339789115483264, CurrSamplesPerSec=5.707825411412025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:55:55,085] [INFO] [timer.py:197:stop] 0/8728, RunningAvgSamplesPerSec=6.339784990563931, CurrSamplesPerSec=5.674094083754208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:56:06,372] [INFO] [timer.py:197:stop] 0/8730, RunningAvgSamplesPerSec=6.339786656439554, CurrSamplesPerSec=5.715914333934802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:56:17,672] [INFO] [timer.py:197:stop] 0/8732, RunningAvgSamplesPerSec=6.339786147497288, CurrSamplesPerSec=5.721630747784866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:56:28,943] [INFO] [timer.py:197:stop] 0/8734, RunningAvgSamplesPerSec=6.339789929077884, CurrSamplesPerSec=5.724208574891888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:56:40,244] [INFO] [timer.py:197:stop] 0/8736, RunningAvgSamplesPerSec=6.339790209997535, CurrSamplesPerSec=5.710170218452381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:56:51,537] [INFO] [timer.py:197:stop] 0/8738, RunningAvgSamplesPerSec=6.339788493527141, CurrSamplesPerSec=5.702072582554532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:57:02,815] [INFO] [logging.py:68:log_dist] [Rank 0] step=4370, skipped=5, lr=[1.4133333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 11:57:02,817] [INFO] [timer.py:197:stop] 0/8740, RunningAvgSamplesPerSec=6.339789347590431, CurrSamplesPerSec=5.720313936415729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:57:14,137] [INFO] [timer.py:197:stop] 0/8742, RunningAvgSamplesPerSec=6.339785628082903, CurrSamplesPerSec=5.702052718498164, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:57:25,449] [INFO] [timer.py:197:stop] 0/8744, RunningAvgSamplesPerSec=6.339784106733126, CurrSamplesPerSec=5.686165463352801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:57:36,809] [INFO] [timer.py:197:stop] 0/8746, RunningAvgSamplesPerSec=6.339777927706155, CurrSamplesPerSec=5.664655416875177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:57:48,124] [INFO] [timer.py:197:stop] 0/8748, RunningAvgSamplesPerSec=6.339776119560729, CurrSamplesPerSec=5.702787296157063, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:57:59,440] [INFO] [timer.py:197:stop] 0/8750, RunningAvgSamplesPerSec=6.339774011859624, CurrSamplesPerSec=5.697596980180033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.4022222222222223e-06, 'epoch': 18.54} [2022-12-17 11:58:10,752] [INFO] [timer.py:197:stop] 0/8752, RunningAvgSamplesPerSec=6.339774009482486, CurrSamplesPerSec=5.704056293554354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:58:22,086] [INFO] [timer.py:197:stop] 0/8754, RunningAvgSamplesPerSec=6.339770577390216, CurrSamplesPerSec=5.692566032678308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:58:33,408] [INFO] [timer.py:197:stop] 0/8756, RunningAvgSamplesPerSec=6.339767354342624, CurrSamplesPerSec=5.698916660998225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:58:44,671] [INFO] [timer.py:197:stop] 0/8758, RunningAvgSamplesPerSec=6.339769632773332, CurrSamplesPerSec=5.714297002697882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:58:55,958] [INFO] [logging.py:68:log_dist] [Rank 0] step=4380, skipped=5, lr=[1.3911111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 11:58:55,960] [INFO] [timer.py:197:stop] 0/8760, RunningAvgSamplesPerSec=6.339770445968811, CurrSamplesPerSec=5.699182848523754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:59:07,275] [INFO] [timer.py:197:stop] 0/8762, RunningAvgSamplesPerSec=6.339767552717385, CurrSamplesPerSec=5.676156550218765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:59:18,621] [INFO] [timer.py:197:stop] 0/8764, RunningAvgSamplesPerSec=6.33976088010469, CurrSamplesPerSec=5.650065903817682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:59:29,908] [INFO] [timer.py:197:stop] 0/8766, RunningAvgSamplesPerSec=6.339760982757028, CurrSamplesPerSec=5.719902922799692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:59:41,162] [INFO] [timer.py:197:stop] 0/8768, RunningAvgSamplesPerSec=6.339764568109337, CurrSamplesPerSec=5.714843230238036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 11:59:52,461] [INFO] [timer.py:197:stop] 0/8770, RunningAvgSamplesPerSec=6.339763916685576, CurrSamplesPerSec=5.708102870413749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:00:03,710] [INFO] [timer.py:197:stop] 0/8772, RunningAvgSamplesPerSec=6.339768770506873, CurrSamplesPerSec=5.707572250574403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:00:15,199] [INFO] [timer.py:197:stop] 0/8774, RunningAvgSamplesPerSec=6.339768808350476, CurrSamplesPerSec=5.703915696891798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:00:26,473] [INFO] [timer.py:197:stop] 0/8776, RunningAvgSamplesPerSec=6.3397699540445895, CurrSamplesPerSec=5.71347676267275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:00:37,762] [INFO] [timer.py:197:stop] 0/8778, RunningAvgSamplesPerSec=6.339771212196474, CurrSamplesPerSec=5.716667827312139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:00:49,067] [INFO] [logging.py:68:log_dist] [Rank 0] step=4390, skipped=5, lr=[1.3688888888888891e-06], mom=[[0.9, 0.999]] [2022-12-17 12:00:49,069] [INFO] [timer.py:197:stop] 0/8780, RunningAvgSamplesPerSec=6.33977060376882, CurrSamplesPerSec=5.700833039940614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:01:00,359] [INFO] [timer.py:197:stop] 0/8782, RunningAvgSamplesPerSec=6.3397703540823045, CurrSamplesPerSec=5.686433592944195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:01:11,689] [INFO] [timer.py:197:stop] 0/8784, RunningAvgSamplesPerSec=6.339767195511378, CurrSamplesPerSec=5.661934120243849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:01:22,944] [INFO] [timer.py:197:stop] 0/8786, RunningAvgSamplesPerSec=6.339773632242905, CurrSamplesPerSec=5.7119300884010835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:01:34,220] [INFO] [timer.py:197:stop] 0/8788, RunningAvgSamplesPerSec=6.3397773377908, CurrSamplesPerSec=5.731191631071715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:01:45,510] [INFO] [timer.py:197:stop] 0/8790, RunningAvgSamplesPerSec=6.339781630455612, CurrSamplesPerSec=5.71911055970979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:01:56,814] [INFO] [timer.py:197:stop] 0/8792, RunningAvgSamplesPerSec=6.339781805486685, CurrSamplesPerSec=5.695523739898146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:02:08,149] [INFO] [timer.py:197:stop] 0/8794, RunningAvgSamplesPerSec=6.339777855304927, CurrSamplesPerSec=5.6880899077540805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:02:19,448] [INFO] [timer.py:197:stop] 0/8796, RunningAvgSamplesPerSec=6.339777923984906, CurrSamplesPerSec=5.70172716149224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:02:30,805] [INFO] [timer.py:197:stop] 0/8798, RunningAvgSamplesPerSec=6.339769911695078, CurrSamplesPerSec=5.64180165759628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:02:42,118] [INFO] [logging.py:68:log_dist] [Rank 0] step=4400, skipped=5, lr=[1.3466666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 12:02:42,120] [INFO] [timer.py:197:stop] 0/8800, RunningAvgSamplesPerSec=6.339769086325866, CurrSamplesPerSec=5.704153745652911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.3466666666666668e-06, 'epoch': 18.64} [2022-12-17 12:02:53,392] [INFO] [timer.py:197:stop] 0/8802, RunningAvgSamplesPerSec=6.339773288566415, CurrSamplesPerSec=5.71960189158303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:03:04,704] [INFO] [timer.py:197:stop] 0/8804, RunningAvgSamplesPerSec=6.339771021610924, CurrSamplesPerSec=5.698482102803512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:03:15,993] [INFO] [timer.py:197:stop] 0/8806, RunningAvgSamplesPerSec=6.339771096186967, CurrSamplesPerSec=5.7159873617558965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:03:27,289] [INFO] [timer.py:197:stop] 0/8808, RunningAvgSamplesPerSec=6.339765525460466, CurrSamplesPerSec=5.687666157923678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:03:38,637] [INFO] [timer.py:197:stop] 0/8810, RunningAvgSamplesPerSec=6.339758676860227, CurrSamplesPerSec=5.67841582462765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:03:49,898] [INFO] [timer.py:197:stop] 0/8812, RunningAvgSamplesPerSec=6.339757272024986, CurrSamplesPerSec=5.699347655263886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:04:01,192] [INFO] [timer.py:197:stop] 0/8814, RunningAvgSamplesPerSec=6.339757954583949, CurrSamplesPerSec=5.717018957337291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:04:12,542] [INFO] [timer.py:197:stop] 0/8816, RunningAvgSamplesPerSec=6.339752702146053, CurrSamplesPerSec=5.670965693668193, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:04:23,824] [INFO] [timer.py:197:stop] 0/8818, RunningAvgSamplesPerSec=6.339757232182007, CurrSamplesPerSec=5.718000012354702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:04:35,118] [INFO] [logging.py:68:log_dist] [Rank 0] step=4410, skipped=5, lr=[1.3244444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 12:04:35,120] [INFO] [timer.py:197:stop] 0/8820, RunningAvgSamplesPerSec=6.339758213641389, CurrSamplesPerSec=5.712680586457322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:04:46,597] [INFO] [timer.py:197:stop] 0/8822, RunningAvgSamplesPerSec=6.3397591900724315, CurrSamplesPerSec=5.717291709450172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:04:57,854] [INFO] [timer.py:197:stop] 0/8824, RunningAvgSamplesPerSec=6.339762923742962, CurrSamplesPerSec=5.7159978292298215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:05:09,122] [INFO] [timer.py:197:stop] 0/8826, RunningAvgSamplesPerSec=6.339765011045626, CurrSamplesPerSec=5.703528607066971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:05:20,396] [INFO] [timer.py:197:stop] 0/8828, RunningAvgSamplesPerSec=6.33976912955601, CurrSamplesPerSec=5.723405256061477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:05:31,663] [INFO] [timer.py:197:stop] 0/8830, RunningAvgSamplesPerSec=6.339773451329037, CurrSamplesPerSec=5.722831769221091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:05:42,916] [INFO] [timer.py:197:stop] 0/8832, RunningAvgSamplesPerSec=6.339779694712146, CurrSamplesPerSec=5.734509587094221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:05:54,209] [INFO] [timer.py:197:stop] 0/8834, RunningAvgSamplesPerSec=6.339780787868824, CurrSamplesPerSec=5.692312533690265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:06:05,497] [INFO] [timer.py:197:stop] 0/8836, RunningAvgSamplesPerSec=6.339782808639936, CurrSamplesPerSec=5.699461888141984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:06:16,791] [INFO] [timer.py:197:stop] 0/8838, RunningAvgSamplesPerSec=6.339784492164005, CurrSamplesPerSec=5.695571353091131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:06:28,063] [INFO] [logging.py:68:log_dist] [Rank 0] step=4420, skipped=5, lr=[1.3022222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 12:06:28,064] [INFO] [timer.py:197:stop] 0/8840, RunningAvgSamplesPerSec=6.339785659528612, CurrSamplesPerSec=5.691189925329324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:06:39,357] [INFO] [timer.py:197:stop] 0/8842, RunningAvgSamplesPerSec=6.339784956161142, CurrSamplesPerSec=5.699604927715029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:06:50,654] [INFO] [timer.py:197:stop] 0/8844, RunningAvgSamplesPerSec=6.339786231913779, CurrSamplesPerSec=5.713109044674698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:07:01,975] [INFO] [timer.py:197:stop] 0/8846, RunningAvgSamplesPerSec=6.339784180655913, CurrSamplesPerSec=5.687275486635333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:07:13,286] [INFO] [timer.py:197:stop] 0/8848, RunningAvgSamplesPerSec=6.3397832613420135, CurrSamplesPerSec=5.710735581744126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:07:24,562] [INFO] [timer.py:197:stop] 0/8850, RunningAvgSamplesPerSec=6.339788823688189, CurrSamplesPerSec=5.731127024156072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.2911111111111112e-06, 'epoch': 18.75} [2022-12-17 12:07:35,838] [INFO] [timer.py:197:stop] 0/8852, RunningAvgSamplesPerSec=6.339792531338122, CurrSamplesPerSec=5.726484770543781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:07:47,127] [INFO] [timer.py:197:stop] 0/8854, RunningAvgSamplesPerSec=6.339794181176809, CurrSamplesPerSec=5.726559779030069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:07:58,386] [INFO] [timer.py:197:stop] 0/8856, RunningAvgSamplesPerSec=6.339795549648679, CurrSamplesPerSec=5.70901481544623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:08:09,671] [INFO] [timer.py:197:stop] 0/8858, RunningAvgSamplesPerSec=6.339797797999399, CurrSamplesPerSec=5.713039008517689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:08:20,970] [INFO] [logging.py:68:log_dist] [Rank 0] step=4430, skipped=5, lr=[1.28e-06], mom=[[0.9, 0.999]] [2022-12-17 12:08:20,972] [INFO] [timer.py:197:stop] 0/8860, RunningAvgSamplesPerSec=6.33979745410494, CurrSamplesPerSec=5.698277670452691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:08:32,265] [INFO] [timer.py:197:stop] 0/8862, RunningAvgSamplesPerSec=6.339798306758194, CurrSamplesPerSec=5.715546058725313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:08:43,513] [INFO] [timer.py:197:stop] 0/8864, RunningAvgSamplesPerSec=6.339806136942207, CurrSamplesPerSec=5.735413814205769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:08:54,778] [INFO] [timer.py:197:stop] 0/8866, RunningAvgSamplesPerSec=6.339811109255454, CurrSamplesPerSec=5.736914872436329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:09:06,063] [INFO] [timer.py:197:stop] 0/8868, RunningAvgSamplesPerSec=6.33981305741132, CurrSamplesPerSec=5.706303629299453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:09:17,358] [INFO] [timer.py:197:stop] 0/8870, RunningAvgSamplesPerSec=6.339813871437535, CurrSamplesPerSec=5.696534905788065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:09:28,799] [INFO] [timer.py:197:stop] 0/8872, RunningAvgSamplesPerSec=6.3398134435915, CurrSamplesPerSec=5.698347829014574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:09:40,104] [INFO] [timer.py:197:stop] 0/8874, RunningAvgSamplesPerSec=6.339812533732992, CurrSamplesPerSec=5.698448957153876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:09:51,406] [INFO] [timer.py:197:stop] 0/8876, RunningAvgSamplesPerSec=6.33981209534651, CurrSamplesPerSec=5.700115669034691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:10:02,697] [INFO] [timer.py:197:stop] 0/8878, RunningAvgSamplesPerSec=6.3398137092677285, CurrSamplesPerSec=5.719678669814262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:10:13,951] [INFO] [logging.py:68:log_dist] [Rank 0] step=4440, skipped=5, lr=[1.2577777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 12:10:13,953] [INFO] [timer.py:197:stop] 0/8880, RunningAvgSamplesPerSec=6.339817566446612, CurrSamplesPerSec=5.738804139876291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:10:25,255] [INFO] [timer.py:197:stop] 0/8882, RunningAvgSamplesPerSec=6.339816636345211, CurrSamplesPerSec=5.694697765622774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:10:36,527] [INFO] [timer.py:197:stop] 0/8884, RunningAvgSamplesPerSec=6.33982069811268, CurrSamplesPerSec=5.7302190106115445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:10:47,765] [INFO] [timer.py:197:stop] 0/8886, RunningAvgSamplesPerSec=6.339829384718083, CurrSamplesPerSec=5.74620588582076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:10:59,051] [INFO] [timer.py:197:stop] 0/8888, RunningAvgSamplesPerSec=6.339830894803963, CurrSamplesPerSec=5.724798697750464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:11:10,348] [INFO] [timer.py:197:stop] 0/8890, RunningAvgSamplesPerSec=6.339831816636154, CurrSamplesPerSec=5.715971538902973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:11:21,641] [INFO] [timer.py:197:stop] 0/8892, RunningAvgSamplesPerSec=6.339830535624382, CurrSamplesPerSec=5.6835113091805844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:11:32,945] [INFO] [timer.py:197:stop] 0/8894, RunningAvgSamplesPerSec=6.339829300115783, CurrSamplesPerSec=5.701985617770172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:11:44,214] [INFO] [timer.py:197:stop] 0/8896, RunningAvgSamplesPerSec=6.339830957348836, CurrSamplesPerSec=5.710873112905728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:11:55,472] [INFO] [timer.py:197:stop] 0/8898, RunningAvgSamplesPerSec=6.339832635853104, CurrSamplesPerSec=5.713768636047296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:12:06,767] [INFO] [logging.py:68:log_dist] [Rank 0] step=4450, skipped=5, lr=[1.2355555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 12:12:06,769] [INFO] [timer.py:197:stop] 0/8900, RunningAvgSamplesPerSec=6.339833089257597, CurrSamplesPerSec=5.69870590651442, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.2355555555555557e-06, 'epoch': 18.86} [2022-12-17 12:12:18,030] [INFO] [timer.py:197:stop] 0/8902, RunningAvgSamplesPerSec=6.339837649166962, CurrSamplesPerSec=5.7158598077118485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:12:29,300] [INFO] [timer.py:197:stop] 0/8904, RunningAvgSamplesPerSec=6.339839674023829, CurrSamplesPerSec=5.711668785350645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:12:40,622] [INFO] [timer.py:197:stop] 0/8906, RunningAvgSamplesPerSec=6.339835937166133, CurrSamplesPerSec=5.700604226707294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:12:51,901] [INFO] [timer.py:197:stop] 0/8908, RunningAvgSamplesPerSec=6.339838380970233, CurrSamplesPerSec=5.713043142548713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:13:03,166] [INFO] [timer.py:197:stop] 0/8910, RunningAvgSamplesPerSec=6.339842998569798, CurrSamplesPerSec=5.715696965497805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:13:14,456] [INFO] [timer.py:197:stop] 0/8912, RunningAvgSamplesPerSec=6.3398408748108315, CurrSamplesPerSec=5.711529271465982, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:13:25,746] [INFO] [timer.py:197:stop] 0/8914, RunningAvgSamplesPerSec=6.339839533197108, CurrSamplesPerSec=5.693573008732244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:13:37,015] [INFO] [timer.py:197:stop] 0/8916, RunningAvgSamplesPerSec=6.339840725581212, CurrSamplesPerSec=5.708154092828949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:13:48,318] [INFO] [timer.py:197:stop] 0/8918, RunningAvgSamplesPerSec=6.339840744759129, CurrSamplesPerSec=5.702353358837302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:13:59,620] [INFO] [logging.py:68:log_dist] [Rank 0] step=4460, skipped=5, lr=[1.2133333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 12:13:59,622] [INFO] [timer.py:197:stop] 0/8920, RunningAvgSamplesPerSec=6.339842393762703, CurrSamplesPerSec=5.709589665323134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:14:10,959] [INFO] [timer.py:197:stop] 0/8922, RunningAvgSamplesPerSec=6.339837047896928, CurrSamplesPerSec=5.680314111197988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:14:22,277] [INFO] [timer.py:197:stop] 0/8924, RunningAvgSamplesPerSec=6.33983454166607, CurrSamplesPerSec=5.685408188041345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:14:33,580] [INFO] [timer.py:197:stop] 0/8926, RunningAvgSamplesPerSec=6.339831781894776, CurrSamplesPerSec=5.699041282062267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:14:44,921] [INFO] [timer.py:197:stop] 0/8928, RunningAvgSamplesPerSec=6.339826765134753, CurrSamplesPerSec=5.681271305657147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:14:56,483] [INFO] [timer.py:197:stop] 0/8930, RunningAvgSamplesPerSec=6.3398219933408395, CurrSamplesPerSec=5.671220649710971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:15:07,811] [INFO] [timer.py:197:stop] 0/8932, RunningAvgSamplesPerSec=6.339817462524407, CurrSamplesPerSec=5.685406502220849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:15:19,085] [INFO] [timer.py:197:stop] 0/8934, RunningAvgSamplesPerSec=6.339820507025385, CurrSamplesPerSec=5.731930309104549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:15:30,372] [INFO] [timer.py:197:stop] 0/8936, RunningAvgSamplesPerSec=6.3398200356741485, CurrSamplesPerSec=5.698484522209096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:15:41,621] [INFO] [timer.py:197:stop] 0/8938, RunningAvgSamplesPerSec=6.3398251188229, CurrSamplesPerSec=5.7179093943535735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:15:52,917] [INFO] [logging.py:68:log_dist] [Rank 0] step=4470, skipped=5, lr=[1.1911111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 12:15:52,918] [INFO] [timer.py:197:stop] 0/8940, RunningAvgSamplesPerSec=6.339826107081712, CurrSamplesPerSec=5.704901231534841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:16:04,199] [INFO] [timer.py:197:stop] 0/8942, RunningAvgSamplesPerSec=6.33982693796467, CurrSamplesPerSec=5.703309998460055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:16:15,513] [INFO] [timer.py:197:stop] 0/8944, RunningAvgSamplesPerSec=6.339823082241985, CurrSamplesPerSec=5.684895503862783, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:16:26,828] [INFO] [timer.py:197:stop] 0/8946, RunningAvgSamplesPerSec=6.33982192948109, CurrSamplesPerSec=5.693825653821079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:16:38,122] [INFO] [timer.py:197:stop] 0/8948, RunningAvgSamplesPerSec=6.339821516488836, CurrSamplesPerSec=5.713446117698301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:16:49,415] [INFO] [timer.py:197:stop] 0/8950, RunningAvgSamplesPerSec=6.339822438115419, CurrSamplesPerSec=5.707341439891114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.1800000000000001e-06, 'epoch': 18.96} [2022-12-17 12:17:00,729] [INFO] [timer.py:197:stop] 0/8952, RunningAvgSamplesPerSec=6.339821235613619, CurrSamplesPerSec=5.711667083920755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:17:12,038] [INFO] [timer.py:197:stop] 0/8954, RunningAvgSamplesPerSec=6.339820683290209, CurrSamplesPerSec=5.704606868918564, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:17:23,332] [INFO] [timer.py:197:stop] 0/8956, RunningAvgSamplesPerSec=6.33982214033586, CurrSamplesPerSec=5.708217211761304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:17:34,674] [INFO] [timer.py:197:stop] 0/8958, RunningAvgSamplesPerSec=6.339817165687331, CurrSamplesPerSec=5.6624811326537765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:17:45,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=4480, skipped=5, lr=[1.168888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 12:17:45,975] [INFO] [timer.py:197:stop] 0/8960, RunningAvgSamplesPerSec=6.339816877634681, CurrSamplesPerSec=5.697127558306983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:17:57,229] [INFO] [timer.py:197:stop] 0/8962, RunningAvgSamplesPerSec=6.339821222503794, CurrSamplesPerSec=5.732452981301859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:18:08,515] [INFO] [timer.py:197:stop] 0/8964, RunningAvgSamplesPerSec=6.339822832371627, CurrSamplesPerSec=5.708728769375107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:18:19,834] [INFO] [timer.py:197:stop] 0/8966, RunningAvgSamplesPerSec=6.339819980968422, CurrSamplesPerSec=5.691758294557167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:18:28,319] [INFO] [timer.py:197:stop] 0/8968, RunningAvgSamplesPerSec=6.340167276359993, CurrSamplesPerSec=10.235759569209229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:18:39,613] [INFO] [timer.py:197:stop] 0/8970, RunningAvgSamplesPerSec=6.340168201205643, CurrSamplesPerSec=5.688185609714782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:18:50,885] [INFO] [timer.py:197:stop] 0/8972, RunningAvgSamplesPerSec=6.340167215906907, CurrSamplesPerSec=5.68072835094578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:19:02,136] [INFO] [timer.py:197:stop] 0/8974, RunningAvgSamplesPerSec=6.340170526445283, CurrSamplesPerSec=5.724859987652656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:19:13,421] [INFO] [timer.py:197:stop] 0/8976, RunningAvgSamplesPerSec=6.340171770538228, CurrSamplesPerSec=5.715203870632199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:19:24,671] [INFO] [timer.py:197:stop] 0/8978, RunningAvgSamplesPerSec=6.340178150339853, CurrSamplesPerSec=5.705607680246926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:19:35,941] [INFO] [logging.py:68:log_dist] [Rank 0] step=4490, skipped=5, lr=[1.1466666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 12:19:35,943] [INFO] [timer.py:197:stop] 0/8980, RunningAvgSamplesPerSec=6.340180370770885, CurrSamplesPerSec=5.697885298209449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:19:47,252] [INFO] [timer.py:197:stop] 0/8982, RunningAvgSamplesPerSec=6.3401789882383754, CurrSamplesPerSec=5.693759954279462, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:19:58,646] [INFO] [timer.py:197:stop] 0/8984, RunningAvgSamplesPerSec=6.340187216793461, CurrSamplesPerSec=5.750422141770742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:20:09,921] [INFO] [timer.py:197:stop] 0/8986, RunningAvgSamplesPerSec=6.340188161061402, CurrSamplesPerSec=5.6917780869534536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:20:21,256] [INFO] [timer.py:197:stop] 0/8988, RunningAvgSamplesPerSec=6.340183001426609, CurrSamplesPerSec=5.67539185899419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:20:32,558] [INFO] [timer.py:197:stop] 0/8990, RunningAvgSamplesPerSec=6.3401830253250155, CurrSamplesPerSec=5.701508933152393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:20:43,826] [INFO] [timer.py:197:stop] 0/8992, RunningAvgSamplesPerSec=6.340185600175935, CurrSamplesPerSec=5.726181824666652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:20:55,092] [INFO] [timer.py:197:stop] 0/8994, RunningAvgSamplesPerSec=6.340188325164848, CurrSamplesPerSec=5.745743916480419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:21:06,365] [INFO] [timer.py:197:stop] 0/8996, RunningAvgSamplesPerSec=6.340190745416319, CurrSamplesPerSec=5.722420881672769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:21:17,714] [INFO] [timer.py:197:stop] 0/8998, RunningAvgSamplesPerSec=6.340184455569013, CurrSamplesPerSec=5.684199952042192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:21:29,042] [INFO] [logging.py:68:log_dist] [Rank 0] step=4500, skipped=5, lr=[1.1244444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 12:21:29,043] [INFO] [timer.py:197:stop] 0/9000, RunningAvgSamplesPerSec=6.3401812963789865, CurrSamplesPerSec=5.70212587704807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.1244444444444446e-06, 'epoch': 19.07} [2022-12-17 12:21:40,383] [INFO] [timer.py:197:stop] 0/9002, RunningAvgSamplesPerSec=6.340174691416419, CurrSamplesPerSec=5.678948005097177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:21:51,710] [INFO] [timer.py:197:stop] 0/9004, RunningAvgSamplesPerSec=6.340169051770326, CurrSamplesPerSec=5.696995766485775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:22:03,013] [INFO] [timer.py:197:stop] 0/9006, RunningAvgSamplesPerSec=6.340169439424454, CurrSamplesPerSec=5.704673788888134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:22:14,286] [INFO] [timer.py:197:stop] 0/9008, RunningAvgSamplesPerSec=6.340170547414348, CurrSamplesPerSec=5.723376701019164, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:22:25,553] [INFO] [timer.py:197:stop] 0/9010, RunningAvgSamplesPerSec=6.340172793652232, CurrSamplesPerSec=5.716274622624514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:22:36,828] [INFO] [timer.py:197:stop] 0/9012, RunningAvgSamplesPerSec=6.340174630700916, CurrSamplesPerSec=5.695876144722008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:22:48,136] [INFO] [timer.py:197:stop] 0/9014, RunningAvgSamplesPerSec=6.340173785718381, CurrSamplesPerSec=5.690204055901287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:22:59,440] [INFO] [timer.py:197:stop] 0/9016, RunningAvgSamplesPerSec=6.34017389648435, CurrSamplesPerSec=5.714023319927587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:23:10,759] [INFO] [timer.py:197:stop] 0/9018, RunningAvgSamplesPerSec=6.340171209682552, CurrSamplesPerSec=5.690274498282226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:23:22,023] [INFO] [logging.py:68:log_dist] [Rank 0] step=4510, skipped=5, lr=[1.1022222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 12:23:22,025] [INFO] [timer.py:197:stop] 0/9020, RunningAvgSamplesPerSec=6.340176370473775, CurrSamplesPerSec=5.725990056949864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:23:33,299] [INFO] [timer.py:197:stop] 0/9022, RunningAvgSamplesPerSec=6.340177698072755, CurrSamplesPerSec=5.703774137599782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:23:44,547] [INFO] [timer.py:197:stop] 0/9024, RunningAvgSamplesPerSec=6.340182563683797, CurrSamplesPerSec=5.724629729995669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:23:55,848] [INFO] [timer.py:197:stop] 0/9026, RunningAvgSamplesPerSec=6.3401803594391275, CurrSamplesPerSec=5.691665610045184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:24:07,132] [INFO] [timer.py:197:stop] 0/9028, RunningAvgSamplesPerSec=6.340180826764234, CurrSamplesPerSec=5.713259336390843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:24:18,389] [INFO] [timer.py:197:stop] 0/9030, RunningAvgSamplesPerSec=6.340182489916034, CurrSamplesPerSec=5.702290127253462, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:24:29,654] [INFO] [timer.py:197:stop] 0/9032, RunningAvgSamplesPerSec=6.3401870134176725, CurrSamplesPerSec=5.718191488867679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:24:40,926] [INFO] [timer.py:197:stop] 0/9034, RunningAvgSamplesPerSec=6.340189064722712, CurrSamplesPerSec=5.72053580105053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:24:52,216] [INFO] [timer.py:197:stop] 0/9036, RunningAvgSamplesPerSec=6.340191777273399, CurrSamplesPerSec=5.707461332910816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:25:03,638] [INFO] [timer.py:197:stop] 0/9038, RunningAvgSamplesPerSec=6.340195482911859, CurrSamplesPerSec=5.717401791781072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:25:14,941] [INFO] [logging.py:68:log_dist] [Rank 0] step=4520, skipped=5, lr=[1.08e-06], mom=[[0.9, 0.999]] [2022-12-17 12:25:14,943] [INFO] [timer.py:197:stop] 0/9040, RunningAvgSamplesPerSec=6.340194721533849, CurrSamplesPerSec=5.693289473929791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:25:26,277] [INFO] [timer.py:197:stop] 0/9042, RunningAvgSamplesPerSec=6.340189349571833, CurrSamplesPerSec=5.685959264084955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:25:37,626] [INFO] [timer.py:197:stop] 0/9044, RunningAvgSamplesPerSec=6.340182920822173, CurrSamplesPerSec=5.652958172497661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:25:48,933] [INFO] [timer.py:197:stop] 0/9046, RunningAvgSamplesPerSec=6.340182202502154, CurrSamplesPerSec=5.681926211518131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:26:00,260] [INFO] [timer.py:197:stop] 0/9048, RunningAvgSamplesPerSec=6.340178368778449, CurrSamplesPerSec=5.672029999469638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:26:11,568] [INFO] [timer.py:197:stop] 0/9050, RunningAvgSamplesPerSec=6.340177804213071, CurrSamplesPerSec=5.712056980924633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.068888888888889e-06, 'epoch': 19.17} [2022-12-17 12:26:22,889] [INFO] [timer.py:197:stop] 0/9052, RunningAvgSamplesPerSec=6.340176452103689, CurrSamplesPerSec=5.711228634922189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:26:34,174] [INFO] [timer.py:197:stop] 0/9054, RunningAvgSamplesPerSec=6.340178437975144, CurrSamplesPerSec=5.725612666283189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:26:45,490] [INFO] [timer.py:197:stop] 0/9056, RunningAvgSamplesPerSec=6.340176071690772, CurrSamplesPerSec=5.682095554381373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:26:56,781] [INFO] [timer.py:197:stop] 0/9058, RunningAvgSamplesPerSec=6.340175540872804, CurrSamplesPerSec=5.678521772244945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:27:08,081] [INFO] [logging.py:68:log_dist] [Rank 0] step=4530, skipped=5, lr=[1.0577777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 12:27:08,083] [INFO] [timer.py:197:stop] 0/9060, RunningAvgSamplesPerSec=6.340176084418741, CurrSamplesPerSec=5.714119653272828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:27:19,412] [INFO] [timer.py:197:stop] 0/9062, RunningAvgSamplesPerSec=6.340172492979608, CurrSamplesPerSec=5.6913443753728625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:27:30,750] [INFO] [timer.py:197:stop] 0/9064, RunningAvgSamplesPerSec=6.340167495179233, CurrSamplesPerSec=5.682358969947255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:27:42,051] [INFO] [timer.py:197:stop] 0/9066, RunningAvgSamplesPerSec=6.340168341948707, CurrSamplesPerSec=5.705136452494435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:27:53,438] [INFO] [timer.py:197:stop] 0/9068, RunningAvgSamplesPerSec=6.340156127963793, CurrSamplesPerSec=5.644270297923019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:28:04,764] [INFO] [timer.py:197:stop] 0/9070, RunningAvgSamplesPerSec=6.340150850458402, CurrSamplesPerSec=5.682519196291389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:28:16,074] [INFO] [timer.py:197:stop] 0/9072, RunningAvgSamplesPerSec=6.340147652003948, CurrSamplesPerSec=5.682816817849004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:28:27,427] [INFO] [timer.py:197:stop] 0/9074, RunningAvgSamplesPerSec=6.340137880887122, CurrSamplesPerSec=5.662173932761074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:28:38,748] [INFO] [timer.py:197:stop] 0/9076, RunningAvgSamplesPerSec=6.340134725198894, CurrSamplesPerSec=5.695298977705807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:28:50,058] [INFO] [timer.py:197:stop] 0/9078, RunningAvgSamplesPerSec=6.3401333643651965, CurrSamplesPerSec=5.709362820154792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:29:01,317] [INFO] [logging.py:68:log_dist] [Rank 0] step=4540, skipped=5, lr=[1.0355555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 12:29:01,319] [INFO] [timer.py:197:stop] 0/9080, RunningAvgSamplesPerSec=6.340137162093382, CurrSamplesPerSec=5.72625462647566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:29:12,608] [INFO] [timer.py:197:stop] 0/9082, RunningAvgSamplesPerSec=6.3401393811599265, CurrSamplesPerSec=5.711931789987668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:29:23,896] [INFO] [timer.py:197:stop] 0/9084, RunningAvgSamplesPerSec=6.34014109355379, CurrSamplesPerSec=5.710250387784129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:29:35,172] [INFO] [timer.py:197:stop] 0/9086, RunningAvgSamplesPerSec=6.340143988565522, CurrSamplesPerSec=5.714197257496509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:29:46,509] [INFO] [timer.py:197:stop] 0/9088, RunningAvgSamplesPerSec=6.340138799723675, CurrSamplesPerSec=5.687047278459022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:29:57,807] [INFO] [timer.py:197:stop] 0/9090, RunningAvgSamplesPerSec=6.340138699289519, CurrSamplesPerSec=5.699062819070333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:30:09,279] [INFO] [timer.py:197:stop] 0/9092, RunningAvgSamplesPerSec=6.340141479739859, CurrSamplesPerSec=5.705302088787175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:30:20,576] [INFO] [timer.py:197:stop] 0/9094, RunningAvgSamplesPerSec=6.340141333697836, CurrSamplesPerSec=5.702480553052258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:30:31,849] [INFO] [timer.py:197:stop] 0/9096, RunningAvgSamplesPerSec=6.340142391459165, CurrSamplesPerSec=5.707457206953979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:30:43,139] [INFO] [timer.py:197:stop] 0/9098, RunningAvgSamplesPerSec=6.340144053045042, CurrSamplesPerSec=5.693934834353835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:30:54,423] [INFO] [logging.py:68:log_dist] [Rank 0] step=4550, skipped=5, lr=[1.0133333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 12:30:54,424] [INFO] [timer.py:197:stop] 0/9100, RunningAvgSamplesPerSec=6.34014581087975, CurrSamplesPerSec=5.70825823981589, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.0133333333333333e-06, 'epoch': 19.28} [2022-12-17 12:31:05,693] [INFO] [timer.py:197:stop] 0/9102, RunningAvgSamplesPerSec=6.340146300653431, CurrSamplesPerSec=5.721616113216765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:31:16,984] [INFO] [timer.py:197:stop] 0/9104, RunningAvgSamplesPerSec=6.340146021568912, CurrSamplesPerSec=5.704086595467083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:31:28,249] [INFO] [timer.py:197:stop] 0/9106, RunningAvgSamplesPerSec=6.340149200742008, CurrSamplesPerSec=5.708663211070656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:31:39,554] [INFO] [timer.py:197:stop] 0/9108, RunningAvgSamplesPerSec=6.340145155623433, CurrSamplesPerSec=5.688826920042316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:31:50,834] [INFO] [timer.py:197:stop] 0/9110, RunningAvgSamplesPerSec=6.340147094179502, CurrSamplesPerSec=5.716423376674596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:32:02,157] [INFO] [timer.py:197:stop] 0/9112, RunningAvgSamplesPerSec=6.340143520926042, CurrSamplesPerSec=5.692019227323512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:32:13,411] [INFO] [timer.py:197:stop] 0/9114, RunningAvgSamplesPerSec=6.340147602549759, CurrSamplesPerSec=5.720771581562675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:32:24,684] [INFO] [timer.py:197:stop] 0/9116, RunningAvgSamplesPerSec=6.340150929871223, CurrSamplesPerSec=5.727981891384154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:32:36,009] [INFO] [timer.py:197:stop] 0/9118, RunningAvgSamplesPerSec=6.340147143194762, CurrSamplesPerSec=5.6731167655569115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:32:47,314] [INFO] [logging.py:68:log_dist] [Rank 0] step=4560, skipped=5, lr=[9.911111111111111e-07], mom=[[0.9, 0.999]] [2022-12-17 12:32:47,316] [INFO] [timer.py:197:stop] 0/9120, RunningAvgSamplesPerSec=6.3401461025404995, CurrSamplesPerSec=5.6980246301305115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:32:58,605] [INFO] [timer.py:197:stop] 0/9122, RunningAvgSamplesPerSec=6.340147576847911, CurrSamplesPerSec=5.700921422456217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:33:09,905] [INFO] [timer.py:197:stop] 0/9124, RunningAvgSamplesPerSec=6.340147927191419, CurrSamplesPerSec=5.72497597811113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:33:21,197] [INFO] [timer.py:197:stop] 0/9126, RunningAvgSamplesPerSec=6.340147353261547, CurrSamplesPerSec=5.711897272286692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:33:32,479] [INFO] [timer.py:197:stop] 0/9128, RunningAvgSamplesPerSec=6.340149692615916, CurrSamplesPerSec=5.710304564092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:33:43,740] [INFO] [timer.py:197:stop] 0/9130, RunningAvgSamplesPerSec=6.340155213278866, CurrSamplesPerSec=5.734318486158727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:33:55,041] [INFO] [timer.py:197:stop] 0/9132, RunningAvgSamplesPerSec=6.340155343894294, CurrSamplesPerSec=5.669024083496483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:34:06,316] [INFO] [timer.py:197:stop] 0/9134, RunningAvgSamplesPerSec=6.340156480313296, CurrSamplesPerSec=5.702799896105481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:34:17,658] [INFO] [timer.py:197:stop] 0/9136, RunningAvgSamplesPerSec=6.340154367953798, CurrSamplesPerSec=5.681481253524794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:34:29,045] [INFO] [timer.py:197:stop] 0/9138, RunningAvgSamplesPerSec=6.340143965937071, CurrSamplesPerSec=5.610906831033096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:34:40,324] [INFO] [logging.py:68:log_dist] [Rank 0] step=4570, skipped=5, lr=[9.68888888888889e-07], mom=[[0.9, 0.999]] [2022-12-17 12:34:40,325] [INFO] [timer.py:197:stop] 0/9140, RunningAvgSamplesPerSec=6.340147034487641, CurrSamplesPerSec=5.708202402963382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:34:51,575] [INFO] [timer.py:197:stop] 0/9142, RunningAvgSamplesPerSec=6.34015044490484, CurrSamplesPerSec=5.726932162951941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:35:02,854] [INFO] [timer.py:197:stop] 0/9144, RunningAvgSamplesPerSec=6.34015603016963, CurrSamplesPerSec=5.721095659088879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:35:14,179] [INFO] [timer.py:197:stop] 0/9146, RunningAvgSamplesPerSec=6.340152719323789, CurrSamplesPerSec=5.68065429785044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:35:25,457] [INFO] [timer.py:197:stop] 0/9148, RunningAvgSamplesPerSec=6.340155277784047, CurrSamplesPerSec=5.706863374094032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:35:36,737] [INFO] [timer.py:197:stop] 0/9150, RunningAvgSamplesPerSec=6.340158367560466, CurrSamplesPerSec=5.708289314711287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 9.57777777777778e-07, 'epoch': 19.39} [2022-12-17 12:35:48,008] [INFO] [timer.py:197:stop] 0/9152, RunningAvgSamplesPerSec=6.340162253630085, CurrSamplesPerSec=5.718088684361913, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:35:59,294] [INFO] [timer.py:197:stop] 0/9154, RunningAvgSamplesPerSec=6.340162321852701, CurrSamplesPerSec=5.711533403312362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:36:10,578] [INFO] [timer.py:197:stop] 0/9156, RunningAvgSamplesPerSec=6.340164153551402, CurrSamplesPerSec=5.707124722983757, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:36:21,870] [INFO] [timer.py:197:stop] 0/9158, RunningAvgSamplesPerSec=6.340164749961444, CurrSamplesPerSec=5.717264189529792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:36:33,149] [INFO] [logging.py:68:log_dist] [Rank 0] step=4580, skipped=5, lr=[9.466666666666667e-07], mom=[[0.9, 0.999]] [2022-12-17 12:36:33,151] [INFO] [timer.py:197:stop] 0/9160, RunningAvgSamplesPerSec=6.340164648943723, CurrSamplesPerSec=5.696304019888085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:36:44,435] [INFO] [timer.py:197:stop] 0/9162, RunningAvgSamplesPerSec=6.340166541679684, CurrSamplesPerSec=5.689125685010388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:36:55,813] [INFO] [timer.py:197:stop] 0/9164, RunningAvgSamplesPerSec=6.340167713844468, CurrSamplesPerSec=5.702747315920129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:37:07,100] [INFO] [timer.py:197:stop] 0/9166, RunningAvgSamplesPerSec=6.34016700060534, CurrSamplesPerSec=5.7019269967660495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:37:18,408] [INFO] [timer.py:197:stop] 0/9168, RunningAvgSamplesPerSec=6.340164707006909, CurrSamplesPerSec=5.666900997116174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:37:29,697] [INFO] [timer.py:197:stop] 0/9170, RunningAvgSamplesPerSec=6.340164121910339, CurrSamplesPerSec=5.690992772376195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:37:40,949] [INFO] [timer.py:197:stop] 0/9172, RunningAvgSamplesPerSec=6.340168160252227, CurrSamplesPerSec=5.71007766211458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:37:52,256] [INFO] [timer.py:197:stop] 0/9174, RunningAvgSamplesPerSec=6.3401676752621166, CurrSamplesPerSec=5.684985319257441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:38:03,683] [INFO] [timer.py:197:stop] 0/9176, RunningAvgSamplesPerSec=6.340150339663452, CurrSamplesPerSec=5.5800397327838285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:38:14,979] [INFO] [timer.py:197:stop] 0/9178, RunningAvgSamplesPerSec=6.340148930847332, CurrSamplesPerSec=5.673131392868632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:38:26,248] [INFO] [logging.py:68:log_dist] [Rank 0] step=4590, skipped=5, lr=[9.244444444444445e-07], mom=[[0.9, 0.999]] [2022-12-17 12:38:26,250] [INFO] [timer.py:197:stop] 0/9180, RunningAvgSamplesPerSec=6.340150298991688, CurrSamplesPerSec=5.698743410477561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:38:37,552] [INFO] [timer.py:197:stop] 0/9182, RunningAvgSamplesPerSec=6.3401497455325915, CurrSamplesPerSec=5.7135380536078735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:38:49,093] [INFO] [timer.py:197:stop] 0/9184, RunningAvgSamplesPerSec=6.3401491895422035, CurrSamplesPerSec=5.686618865268576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:39:00,363] [INFO] [timer.py:197:stop] 0/9186, RunningAvgSamplesPerSec=6.340153255419131, CurrSamplesPerSec=5.7045084315268895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:39:11,652] [INFO] [timer.py:197:stop] 0/9188, RunningAvgSamplesPerSec=6.340155155704845, CurrSamplesPerSec=5.7096839060888716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:39:22,930] [INFO] [timer.py:197:stop] 0/9190, RunningAvgSamplesPerSec=6.340158223776878, CurrSamplesPerSec=5.719141996572548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:39:34,228] [INFO] [timer.py:197:stop] 0/9192, RunningAvgSamplesPerSec=6.340159418782378, CurrSamplesPerSec=5.713817770997784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:39:45,529] [INFO] [timer.py:197:stop] 0/9194, RunningAvgSamplesPerSec=6.340158826753314, CurrSamplesPerSec=5.699113879369092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:39:56,806] [INFO] [timer.py:197:stop] 0/9196, RunningAvgSamplesPerSec=6.340156998854702, CurrSamplesPerSec=5.70595648256646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:40:08,101] [INFO] [timer.py:197:stop] 0/9198, RunningAvgSamplesPerSec=6.340155925570966, CurrSamplesPerSec=5.7080890332365515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:40:19,421] [INFO] [logging.py:68:log_dist] [Rank 0] step=4600, skipped=5, lr=[9.022222222222222e-07], mom=[[0.9, 0.999]] [2022-12-17 12:40:19,423] [INFO] [timer.py:197:stop] 0/9200, RunningAvgSamplesPerSec=6.34015258430215, CurrSamplesPerSec=5.692694480783714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 9.022222222222222e-07, 'epoch': 19.49} [2022-12-17 12:40:30,728] [INFO] [timer.py:197:stop] 0/9202, RunningAvgSamplesPerSec=6.340152615043308, CurrSamplesPerSec=5.710397613692423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:40:42,007] [INFO] [timer.py:197:stop] 0/9204, RunningAvgSamplesPerSec=6.340153859145852, CurrSamplesPerSec=5.711524410477892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:40:53,276] [INFO] [timer.py:197:stop] 0/9206, RunningAvgSamplesPerSec=6.340159289461381, CurrSamplesPerSec=5.735856965083715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:41:04,547] [INFO] [timer.py:197:stop] 0/9208, RunningAvgSamplesPerSec=6.3401623345042974, CurrSamplesPerSec=5.711224746540432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:41:15,941] [INFO] [timer.py:197:stop] 0/9210, RunningAvgSamplesPerSec=6.340157326838278, CurrSamplesPerSec=5.669542530462479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:41:27,264] [INFO] [timer.py:197:stop] 0/9212, RunningAvgSamplesPerSec=6.340156287015333, CurrSamplesPerSec=5.69148555969936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:41:38,552] [INFO] [timer.py:197:stop] 0/9214, RunningAvgSamplesPerSec=6.340158164960546, CurrSamplesPerSec=5.730159073735827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:41:49,875] [INFO] [timer.py:197:stop] 0/9216, RunningAvgSamplesPerSec=6.340159228571226, CurrSamplesPerSec=5.7013219627797875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:42:01,219] [INFO] [timer.py:197:stop] 0/9218, RunningAvgSamplesPerSec=6.340153976221815, CurrSamplesPerSec=5.692182654384938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:42:12,499] [INFO] [logging.py:68:log_dist] [Rank 0] step=4610, skipped=5, lr=[8.8e-07], mom=[[0.9, 0.999]] [2022-12-17 12:42:12,500] [INFO] [timer.py:197:stop] 0/9220, RunningAvgSamplesPerSec=6.340153403841933, CurrSamplesPerSec=5.710089808444558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:42:23,829] [INFO] [timer.py:197:stop] 0/9222, RunningAvgSamplesPerSec=6.340147555697972, CurrSamplesPerSec=5.681393713237994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:42:35,135] [INFO] [timer.py:197:stop] 0/9224, RunningAvgSamplesPerSec=6.340147942549898, CurrSamplesPerSec=5.702918629122964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:42:46,412] [INFO] [timer.py:197:stop] 0/9226, RunningAvgSamplesPerSec=6.340149254212709, CurrSamplesPerSec=5.708072768570178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:42:57,721] [INFO] [timer.py:197:stop] 0/9228, RunningAvgSamplesPerSec=6.340148567886221, CurrSamplesPerSec=5.713274414636991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:43:08,988] [INFO] [timer.py:197:stop] 0/9230, RunningAvgSamplesPerSec=6.340153230507024, CurrSamplesPerSec=5.716390752374001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:43:20,259] [INFO] [timer.py:197:stop] 0/9232, RunningAvgSamplesPerSec=6.340157138416786, CurrSamplesPerSec=5.729755939087335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:43:31,756] [INFO] [timer.py:197:stop] 0/9234, RunningAvgSamplesPerSec=6.34015718668598, CurrSamplesPerSec=5.6950486183439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:43:43,022] [INFO] [timer.py:197:stop] 0/9236, RunningAvgSamplesPerSec=6.340160196742237, CurrSamplesPerSec=5.720007986497109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:43:54,300] [INFO] [timer.py:197:stop] 0/9238, RunningAvgSamplesPerSec=6.3401630097358055, CurrSamplesPerSec=5.715744429759032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:44:05,634] [INFO] [logging.py:68:log_dist] [Rank 0] step=4620, skipped=5, lr=[8.577777777777778e-07], mom=[[0.9, 0.999]] [2022-12-17 12:44:05,636] [INFO] [timer.py:197:stop] 0/9240, RunningAvgSamplesPerSec=6.340158126370072, CurrSamplesPerSec=5.682443893595855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:44:16,933] [INFO] [timer.py:197:stop] 0/9242, RunningAvgSamplesPerSec=6.340158936768335, CurrSamplesPerSec=5.700625775529919, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:44:28,218] [INFO] [timer.py:197:stop] 0/9244, RunningAvgSamplesPerSec=6.3401615269069636, CurrSamplesPerSec=5.719573618214001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:44:39,517] [INFO] [timer.py:197:stop] 0/9246, RunningAvgSamplesPerSec=6.340161629533877, CurrSamplesPerSec=5.72056944795954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:44:50,799] [INFO] [timer.py:197:stop] 0/9248, RunningAvgSamplesPerSec=6.340161049942304, CurrSamplesPerSec=5.71187539508658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:45:02,081] [INFO] [timer.py:197:stop] 0/9250, RunningAvgSamplesPerSec=6.340161406104777, CurrSamplesPerSec=5.69962477472045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 8.466666666666668e-07, 'epoch': 19.6} [2022-12-17 12:45:13,396] [INFO] [timer.py:197:stop] 0/9252, RunningAvgSamplesPerSec=6.3401600092698365, CurrSamplesPerSec=5.68532630648655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:45:24,715] [INFO] [timer.py:197:stop] 0/9254, RunningAvgSamplesPerSec=6.340158751716345, CurrSamplesPerSec=5.694735700092881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:45:36,020] [INFO] [timer.py:197:stop] 0/9256, RunningAvgSamplesPerSec=6.340158815250996, CurrSamplesPerSec=5.703624102116573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:45:47,319] [INFO] [timer.py:197:stop] 0/9258, RunningAvgSamplesPerSec=6.34015930739022, CurrSamplesPerSec=5.7091872339697884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:45:58,600] [INFO] [logging.py:68:log_dist] [Rank 0] step=4630, skipped=5, lr=[8.355555555555556e-07], mom=[[0.9, 0.999]] [2022-12-17 12:45:58,601] [INFO] [timer.py:197:stop] 0/9260, RunningAvgSamplesPerSec=6.340159115245336, CurrSamplesPerSec=5.70457510672883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:46:09,916] [INFO] [timer.py:197:stop] 0/9262, RunningAvgSamplesPerSec=6.3401568534304005, CurrSamplesPerSec=5.684320319201461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:46:21,205] [INFO] [timer.py:197:stop] 0/9264, RunningAvgSamplesPerSec=6.340159812195485, CurrSamplesPerSec=5.72261851024271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:46:32,485] [INFO] [timer.py:197:stop] 0/9266, RunningAvgSamplesPerSec=6.340162767843043, CurrSamplesPerSec=5.7171287856386295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:46:43,801] [INFO] [timer.py:197:stop] 0/9268, RunningAvgSamplesPerSec=6.340160434655951, CurrSamplesPerSec=5.700246636884618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:46:55,126] [INFO] [timer.py:197:stop] 0/9270, RunningAvgSamplesPerSec=6.340157542329263, CurrSamplesPerSec=5.70487746802184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:47:06,461] [INFO] [timer.py:197:stop] 0/9272, RunningAvgSamplesPerSec=6.340155016075218, CurrSamplesPerSec=5.674236332603981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:47:17,758] [INFO] [timer.py:197:stop] 0/9274, RunningAvgSamplesPerSec=6.340155935623798, CurrSamplesPerSec=5.690579205791651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:47:29,030] [INFO] [timer.py:197:stop] 0/9276, RunningAvgSamplesPerSec=6.340157051769977, CurrSamplesPerSec=5.704070838432285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:47:40,303] [INFO] [timer.py:197:stop] 0/9278, RunningAvgSamplesPerSec=6.340160005413446, CurrSamplesPerSec=5.721028597144834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:47:51,671] [INFO] [logging.py:68:log_dist] [Rank 0] step=4640, skipped=5, lr=[8.133333333333333e-07], mom=[[0.9, 0.999]] [2022-12-17 12:47:51,672] [INFO] [timer.py:197:stop] 0/9280, RunningAvgSamplesPerSec=6.340162249396748, CurrSamplesPerSec=5.72207738202556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:48:03,000] [INFO] [timer.py:197:stop] 0/9282, RunningAvgSamplesPerSec=6.3401583228653395, CurrSamplesPerSec=5.687393574085657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:48:14,284] [INFO] [timer.py:197:stop] 0/9284, RunningAvgSamplesPerSec=6.340160924269768, CurrSamplesPerSec=5.718414894341315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:48:25,575] [INFO] [timer.py:197:stop] 0/9286, RunningAvgSamplesPerSec=6.340162293247801, CurrSamplesPerSec=5.72458724545319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:48:36,883] [INFO] [timer.py:197:stop] 0/9288, RunningAvgSamplesPerSec=6.340161675242742, CurrSamplesPerSec=5.710935319988416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:48:48,180] [INFO] [timer.py:197:stop] 0/9290, RunningAvgSamplesPerSec=6.340161733304246, CurrSamplesPerSec=5.698499038685744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:48:59,439] [INFO] [timer.py:197:stop] 0/9292, RunningAvgSamplesPerSec=6.340164248884015, CurrSamplesPerSec=5.731903872013158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:49:10,738] [INFO] [timer.py:197:stop] 0/9294, RunningAvgSamplesPerSec=6.340163637335254, CurrSamplesPerSec=5.700694539154415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:49:22,054] [INFO] [timer.py:197:stop] 0/9296, RunningAvgSamplesPerSec=6.340161429995194, CurrSamplesPerSec=5.711878312036912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:49:33,345] [INFO] [timer.py:197:stop] 0/9298, RunningAvgSamplesPerSec=6.340161402231518, CurrSamplesPerSec=5.715301217270755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:49:44,665] [INFO] [logging.py:68:log_dist] [Rank 0] step=4650, skipped=5, lr=[7.911111111111111e-07], mom=[[0.9, 0.999]] [2022-12-17 12:49:44,666] [INFO] [timer.py:197:stop] 0/9300, RunningAvgSamplesPerSec=6.34015879446203, CurrSamplesPerSec=5.687757025009477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 7.911111111111111e-07, 'epoch': 19.7} [2022-12-17 12:49:55,963] [INFO] [timer.py:197:stop] 0/9302, RunningAvgSamplesPerSec=6.340159094918814, CurrSamplesPerSec=5.705109291970435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:50:07,237] [INFO] [timer.py:197:stop] 0/9304, RunningAvgSamplesPerSec=6.340162530518254, CurrSamplesPerSec=5.725340585042284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:50:18,508] [INFO] [timer.py:197:stop] 0/9306, RunningAvgSamplesPerSec=6.340164719469053, CurrSamplesPerSec=5.717111982365678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:50:29,827] [INFO] [timer.py:197:stop] 0/9308, RunningAvgSamplesPerSec=6.340163259169347, CurrSamplesPerSec=5.684242802166537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:50:41,100] [INFO] [timer.py:197:stop] 0/9310, RunningAvgSamplesPerSec=6.340167314338288, CurrSamplesPerSec=5.725055586914081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:50:52,364] [INFO] [timer.py:197:stop] 0/9312, RunningAvgSamplesPerSec=6.340169757275838, CurrSamplesPerSec=5.718095505399457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:51:03,633] [INFO] [timer.py:197:stop] 0/9314, RunningAvgSamplesPerSec=6.34017535552695, CurrSamplesPerSec=5.7352133403440115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:51:14,926] [INFO] [timer.py:197:stop] 0/9316, RunningAvgSamplesPerSec=6.340176615455552, CurrSamplesPerSec=5.713768149566863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:51:26,259] [INFO] [timer.py:197:stop] 0/9318, RunningAvgSamplesPerSec=6.340175941131643, CurrSamplesPerSec=5.683389532316705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:51:37,560] [INFO] [logging.py:68:log_dist] [Rank 0] step=4660, skipped=5, lr=[7.688888888888891e-07], mom=[[0.9, 0.999]] [2022-12-17 12:51:37,562] [INFO] [timer.py:197:stop] 0/9320, RunningAvgSamplesPerSec=6.3401753738723485, CurrSamplesPerSec=5.697270722570649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:51:48,845] [INFO] [timer.py:197:stop] 0/9322, RunningAvgSamplesPerSec=6.3401783804499825, CurrSamplesPerSec=5.707954791824532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:52:00,169] [INFO] [timer.py:197:stop] 0/9324, RunningAvgSamplesPerSec=6.340176036842008, CurrSamplesPerSec=5.676642689896154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:52:11,437] [INFO] [timer.py:197:stop] 0/9326, RunningAvgSamplesPerSec=6.340181403984736, CurrSamplesPerSec=5.719313321363565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:52:22,736] [INFO] [timer.py:197:stop] 0/9328, RunningAvgSamplesPerSec=6.3401792682857, CurrSamplesPerSec=5.675037425141788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:52:34,009] [INFO] [timer.py:197:stop] 0/9330, RunningAvgSamplesPerSec=6.3401827131228385, CurrSamplesPerSec=5.709366948867249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:52:45,315] [INFO] [timer.py:197:stop] 0/9332, RunningAvgSamplesPerSec=6.34018176622586, CurrSamplesPerSec=5.687114028058006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:52:56,714] [INFO] [timer.py:197:stop] 0/9334, RunningAvgSamplesPerSec=6.3401772493220765, CurrSamplesPerSec=5.684735865809839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:53:08,032] [INFO] [timer.py:197:stop] 0/9336, RunningAvgSamplesPerSec=6.3401745095874045, CurrSamplesPerSec=5.7041876849764614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:53:19,331] [INFO] [timer.py:197:stop] 0/9338, RunningAvgSamplesPerSec=6.340172921987333, CurrSamplesPerSec=5.710699863603781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:53:30,643] [INFO] [logging.py:68:log_dist] [Rank 0] step=4670, skipped=5, lr=[7.466666666666668e-07], mom=[[0.9, 0.999]] [2022-12-17 12:53:30,645] [INFO] [timer.py:197:stop] 0/9340, RunningAvgSamplesPerSec=6.34017212807068, CurrSamplesPerSec=5.681568796509319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:53:41,940] [INFO] [timer.py:197:stop] 0/9342, RunningAvgSamplesPerSec=6.340172906628067, CurrSamplesPerSec=5.725945353710456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:53:53,255] [INFO] [timer.py:197:stop] 0/9344, RunningAvgSamplesPerSec=6.3401717534789475, CurrSamplesPerSec=5.705783531440709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:54:04,544] [INFO] [timer.py:197:stop] 0/9346, RunningAvgSamplesPerSec=6.3401735765286045, CurrSamplesPerSec=5.724853638836918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:54:15,850] [INFO] [timer.py:197:stop] 0/9348, RunningAvgSamplesPerSec=6.340168712130555, CurrSamplesPerSec=5.684671338945308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:54:27,137] [INFO] [timer.py:197:stop] 0/9350, RunningAvgSamplesPerSec=6.340168979319973, CurrSamplesPerSec=5.704432787366835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 7.355555555555556e-07, 'epoch': 19.81} [2022-12-17 12:54:38,478] [INFO] [timer.py:197:stop] 0/9352, RunningAvgSamplesPerSec=6.340164046274461, CurrSamplesPerSec=5.680551876945872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:54:49,801] [INFO] [timer.py:197:stop] 0/9354, RunningAvgSamplesPerSec=6.340160208992431, CurrSamplesPerSec=5.693153029852572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:55:01,089] [INFO] [timer.py:197:stop] 0/9356, RunningAvgSamplesPerSec=6.340161150460569, CurrSamplesPerSec=5.701699064564827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:55:12,396] [INFO] [timer.py:197:stop] 0/9358, RunningAvgSamplesPerSec=6.340158916026741, CurrSamplesPerSec=5.697199381370964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:55:23,677] [INFO] [logging.py:68:log_dist] [Rank 0] step=4680, skipped=5, lr=[7.244444444444446e-07], mom=[[0.9, 0.999]] [2022-12-17 12:55:23,679] [INFO] [timer.py:197:stop] 0/9360, RunningAvgSamplesPerSec=6.34016184496023, CurrSamplesPerSec=5.720007742725363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:55:34,926] [INFO] [timer.py:197:stop] 0/9362, RunningAvgSamplesPerSec=6.340168235538573, CurrSamplesPerSec=5.749692976105513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:55:46,320] [INFO] [timer.py:197:stop] 0/9364, RunningAvgSamplesPerSec=6.3401557681292, CurrSamplesPerSec=5.607521341696767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:55:57,593] [INFO] [timer.py:197:stop] 0/9366, RunningAvgSamplesPerSec=6.3401619815766015, CurrSamplesPerSec=5.740824544694479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:56:08,921] [INFO] [timer.py:197:stop] 0/9368, RunningAvgSamplesPerSec=6.340162699156885, CurrSamplesPerSec=5.677324385946494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:56:20,226] [INFO] [timer.py:197:stop] 0/9370, RunningAvgSamplesPerSec=6.340159813721217, CurrSamplesPerSec=5.69181477566186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:56:31,543] [INFO] [timer.py:197:stop] 0/9372, RunningAvgSamplesPerSec=6.340156868598053, CurrSamplesPerSec=5.6598232614155615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:56:42,837] [INFO] [timer.py:197:stop] 0/9374, RunningAvgSamplesPerSec=6.340157914042522, CurrSamplesPerSec=5.704102352588936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:56:54,307] [INFO] [timer.py:197:stop] 0/9376, RunningAvgSamplesPerSec=6.3401579124363225, CurrSamplesPerSec=5.707784389579081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:57:05,590] [INFO] [timer.py:197:stop] 0/9378, RunningAvgSamplesPerSec=6.340159614914527, CurrSamplesPerSec=5.71288119027779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:57:16,873] [INFO] [logging.py:68:log_dist] [Rank 0] step=4690, skipped=5, lr=[7.022222222222223e-07], mom=[[0.9, 0.999]] [2022-12-17 12:57:16,875] [INFO] [timer.py:197:stop] 0/9380, RunningAvgSamplesPerSec=6.340161045115699, CurrSamplesPerSec=5.714271214580349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:57:28,212] [INFO] [timer.py:197:stop] 0/9382, RunningAvgSamplesPerSec=6.340157356492733, CurrSamplesPerSec=5.691090985175394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:57:39,507] [INFO] [timer.py:197:stop] 0/9384, RunningAvgSamplesPerSec=6.340157980652823, CurrSamplesPerSec=5.712876326988239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:57:50,792] [INFO] [timer.py:197:stop] 0/9386, RunningAvgSamplesPerSec=6.340157660006358, CurrSamplesPerSec=5.6946078846918695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:58:02,061] [INFO] [timer.py:197:stop] 0/9388, RunningAvgSamplesPerSec=6.3401619432927445, CurrSamplesPerSec=5.723264680143104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:58:13,339] [INFO] [timer.py:197:stop] 0/9390, RunningAvgSamplesPerSec=6.340164537079148, CurrSamplesPerSec=5.731874497753184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:58:24,624] [INFO] [timer.py:197:stop] 0/9392, RunningAvgSamplesPerSec=6.340169133071809, CurrSamplesPerSec=5.724244706348782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:58:35,945] [INFO] [timer.py:197:stop] 0/9394, RunningAvgSamplesPerSec=6.3401673757243175, CurrSamplesPerSec=5.700526507026288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:58:47,278] [INFO] [timer.py:197:stop] 0/9396, RunningAvgSamplesPerSec=6.340163410174969, CurrSamplesPerSec=5.68879750339704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:58:58,578] [INFO] [timer.py:197:stop] 0/9398, RunningAvgSamplesPerSec=6.340164316536951, CurrSamplesPerSec=5.704421149981809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:59:09,853] [INFO] [logging.py:68:log_dist] [Rank 0] step=4700, skipped=5, lr=[6.800000000000001e-07], mom=[[0.9, 0.999]] [2022-12-17 12:59:09,855] [INFO] [timer.py:197:stop] 0/9400, RunningAvgSamplesPerSec=6.340164796341688, CurrSamplesPerSec=5.718083081378961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 6.800000000000001e-07, 'epoch': 19.92} [2022-12-17 12:59:21,160] [INFO] [timer.py:197:stop] 0/9402, RunningAvgSamplesPerSec=6.340163757847574, CurrSamplesPerSec=5.702092688996432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:59:32,460] [INFO] [timer.py:197:stop] 0/9404, RunningAvgSamplesPerSec=6.34016305691172, CurrSamplesPerSec=5.712065489245712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:59:43,743] [INFO] [timer.py:197:stop] 0/9406, RunningAvgSamplesPerSec=6.3401647800453125, CurrSamplesPerSec=5.709922679480389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 12:59:55,041] [INFO] [timer.py:197:stop] 0/9408, RunningAvgSamplesPerSec=6.340165372569974, CurrSamplesPerSec=5.702819523059122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:00:06,357] [INFO] [timer.py:197:stop] 0/9410, RunningAvgSamplesPerSec=6.340163380924811, CurrSamplesPerSec=5.681837935931388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:00:17,682] [INFO] [timer.py:197:stop] 0/9412, RunningAvgSamplesPerSec=6.340159713715301, CurrSamplesPerSec=5.696154135396421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:00:28,954] [INFO] [timer.py:197:stop] 0/9414, RunningAvgSamplesPerSec=6.340163134519667, CurrSamplesPerSec=5.73100686895559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:00:40,268] [INFO] [timer.py:197:stop] 0/9416, RunningAvgSamplesPerSec=6.340160765431248, CurrSamplesPerSec=5.705356656289844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:00:51,568] [INFO] [timer.py:197:stop] 0/9418, RunningAvgSamplesPerSec=6.340159910382644, CurrSamplesPerSec=5.691176411349352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:01:02,868] [INFO] [logging.py:68:log_dist] [Rank 0] step=4710, skipped=5, lr=[6.577777777777779e-07], mom=[[0.9, 0.999]] [2022-12-17 13:01:02,870] [INFO] [timer.py:197:stop] 0/9420, RunningAvgSamplesPerSec=6.340158895996132, CurrSamplesPerSec=5.69437957089141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:01:14,169] [INFO] [timer.py:197:stop] 0/9422, RunningAvgSamplesPerSec=6.340159772010291, CurrSamplesPerSec=5.704893956969019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:01:25,476] [INFO] [timer.py:197:stop] 0/9424, RunningAvgSamplesPerSec=6.340158699659639, CurrSamplesPerSec=5.697029137049317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:01:36,784] [INFO] [timer.py:197:stop] 0/9426, RunningAvgSamplesPerSec=6.34015888231509, CurrSamplesPerSec=5.7065775435697255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:01:48,099] [INFO] [timer.py:197:stop] 0/9428, RunningAvgSamplesPerSec=6.340157566303402, CurrSamplesPerSec=5.6882292431636685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:01:59,383] [INFO] [timer.py:197:stop] 0/9430, RunningAvgSamplesPerSec=6.340160039398767, CurrSamplesPerSec=5.715575509259925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:02:10,680] [INFO] [timer.py:197:stop] 0/9432, RunningAvgSamplesPerSec=6.34015882945978, CurrSamplesPerSec=5.695347795512591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:02:21,962] [INFO] [timer.py:197:stop] 0/9434, RunningAvgSamplesPerSec=6.3401592538599445, CurrSamplesPerSec=5.703278493058004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:02:33,260] [INFO] [timer.py:197:stop] 0/9436, RunningAvgSamplesPerSec=6.340159679445317, CurrSamplesPerSec=5.71391117838237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:02:44,507] [INFO] [timer.py:197:stop] 0/9438, RunningAvgSamplesPerSec=6.3401666888108155, CurrSamplesPerSec=5.731557519685642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:02:53,004] [INFO] [logging.py:68:log_dist] [Rank 0] step=4720, skipped=5, lr=[6.355555555555556e-07], mom=[[0.9, 0.999]] [2022-12-17 13:02:53,006] [INFO] [timer.py:197:stop] 0/9440, RunningAvgSamplesPerSec=6.340495323266648, CurrSamplesPerSec=10.172010687840247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:03:04,333] [INFO] [timer.py:197:stop] 0/9442, RunningAvgSamplesPerSec=6.340491284144846, CurrSamplesPerSec=5.67950263607097, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:03:15,606] [INFO] [timer.py:197:stop] 0/9444, RunningAvgSamplesPerSec=6.340495059977157, CurrSamplesPerSec=5.720786211810857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:03:26,942] [INFO] [timer.py:197:stop] 0/9446, RunningAvgSamplesPerSec=6.340490948343084, CurrSamplesPerSec=5.663866573890551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:03:38,290] [INFO] [timer.py:197:stop] 0/9448, RunningAvgSamplesPerSec=6.340490033332966, CurrSamplesPerSec=5.692823417606247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:03:49,604] [INFO] [timer.py:197:stop] 0/9450, RunningAvgSamplesPerSec=6.340487672942547, CurrSamplesPerSec=5.679561758239619, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 6.244444444444445e-07, 'epoch': 20.02} [2022-12-17 13:04:00,945] [INFO] [timer.py:197:stop] 0/9452, RunningAvgSamplesPerSec=6.340482910496316, CurrSamplesPerSec=5.684355467373568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:04:12,236] [INFO] [timer.py:197:stop] 0/9454, RunningAvgSamplesPerSec=6.34048233937946, CurrSamplesPerSec=5.706420324723856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:04:23,487] [INFO] [timer.py:197:stop] 0/9456, RunningAvgSamplesPerSec=6.340486824333612, CurrSamplesPerSec=5.7355645464489875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:04:34,766] [INFO] [timer.py:197:stop] 0/9458, RunningAvgSamplesPerSec=6.34048767797844, CurrSamplesPerSec=5.698039628069896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:04:46,018] [INFO] [logging.py:68:log_dist] [Rank 0] step=4730, skipped=5, lr=[6.133333333333333e-07], mom=[[0.9, 0.999]] [2022-12-17 13:04:46,020] [INFO] [timer.py:197:stop] 0/9460, RunningAvgSamplesPerSec=6.340492541433014, CurrSamplesPerSec=5.735454008695971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:04:57,292] [INFO] [timer.py:197:stop] 0/9462, RunningAvgSamplesPerSec=6.340493819560258, CurrSamplesPerSec=5.710972499087195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:05:08,606] [INFO] [timer.py:197:stop] 0/9464, RunningAvgSamplesPerSec=6.34049219161994, CurrSamplesPerSec=5.696880907027384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:05:19,873] [INFO] [timer.py:197:stop] 0/9466, RunningAvgSamplesPerSec=6.340496758985929, CurrSamplesPerSec=5.7248873365583535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:05:31,147] [INFO] [timer.py:197:stop] 0/9468, RunningAvgSamplesPerSec=6.340498220648656, CurrSamplesPerSec=5.710878215779455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:05:42,436] [INFO] [timer.py:197:stop] 0/9470, RunningAvgSamplesPerSec=6.340497412838961, CurrSamplesPerSec=5.704533404238147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:05:53,775] [INFO] [timer.py:197:stop] 0/9472, RunningAvgSamplesPerSec=6.340491861896453, CurrSamplesPerSec=5.657351254286605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:06:05,054] [INFO] [timer.py:197:stop] 0/9474, RunningAvgSamplesPerSec=6.340493855901343, CurrSamplesPerSec=5.727688808571485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:06:16,331] [INFO] [timer.py:197:stop] 0/9476, RunningAvgSamplesPerSec=6.340496368385586, CurrSamplesPerSec=5.70959112263088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:06:27,652] [INFO] [timer.py:197:stop] 0/9478, RunningAvgSamplesPerSec=6.34049401426013, CurrSamplesPerSec=5.69632650324217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:06:38,925] [INFO] [logging.py:68:log_dist] [Rank 0] step=4740, skipped=5, lr=[5.911111111111111e-07], mom=[[0.9, 0.999]] [2022-12-17 13:06:38,927] [INFO] [timer.py:197:stop] 0/9480, RunningAvgSamplesPerSec=6.340497114118926, CurrSamplesPerSec=5.710877486796936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:06:50,210] [INFO] [timer.py:197:stop] 0/9482, RunningAvgSamplesPerSec=6.340499386506941, CurrSamplesPerSec=5.705616897005835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:07:01,460] [INFO] [timer.py:197:stop] 0/9484, RunningAvgSamplesPerSec=6.340505909929958, CurrSamplesPerSec=5.74067329032332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:07:12,722] [INFO] [timer.py:197:stop] 0/9486, RunningAvgSamplesPerSec=6.340509912791447, CurrSamplesPerSec=5.723496536483985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:07:24,014] [INFO] [timer.py:197:stop] 0/9488, RunningAvgSamplesPerSec=6.340508227847923, CurrSamplesPerSec=5.67338486498578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:07:35,313] [INFO] [timer.py:197:stop] 0/9490, RunningAvgSamplesPerSec=6.3405131586608485, CurrSamplesPerSec=5.720401704983039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:07:46,561] [INFO] [timer.py:197:stop] 0/9492, RunningAvgSamplesPerSec=6.340517223218006, CurrSamplesPerSec=5.7341834981743185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:07:57,791] [INFO] [timer.py:197:stop] 0/9494, RunningAvgSamplesPerSec=6.340525965766664, CurrSamplesPerSec=5.73762363116758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:08:09,054] [INFO] [timer.py:197:stop] 0/9496, RunningAvgSamplesPerSec=6.340529051682444, CurrSamplesPerSec=5.731980736084711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:08:20,372] [INFO] [timer.py:197:stop] 0/9498, RunningAvgSamplesPerSec=6.340527546537201, CurrSamplesPerSec=5.694462438228334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:08:31,637] [INFO] [logging.py:68:log_dist] [Rank 0] step=4750, skipped=5, lr=[5.68888888888889e-07], mom=[[0.9, 0.999]] [2022-12-17 13:08:31,638] [INFO] [timer.py:197:stop] 0/9500, RunningAvgSamplesPerSec=6.340532537658078, CurrSamplesPerSec=5.741129778677454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 5.68888888888889e-07, 'epoch': 20.13} [2022-12-17 13:08:42,927] [INFO] [timer.py:197:stop] 0/9502, RunningAvgSamplesPerSec=6.34053487353236, CurrSamplesPerSec=5.709036670739029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:08:54,189] [INFO] [timer.py:197:stop] 0/9504, RunningAvgSamplesPerSec=6.340539638784326, CurrSamplesPerSec=5.706077530334509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:09:05,432] [INFO] [timer.py:197:stop] 0/9506, RunningAvgSamplesPerSec=6.340543448414384, CurrSamplesPerSec=5.7210751745464705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:09:16,749] [INFO] [timer.py:197:stop] 0/9508, RunningAvgSamplesPerSec=6.340541355451662, CurrSamplesPerSec=5.687145114146095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:09:28,055] [INFO] [timer.py:197:stop] 0/9510, RunningAvgSamplesPerSec=6.3405402884819715, CurrSamplesPerSec=5.715332125522094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:09:39,399] [INFO] [timer.py:197:stop] 0/9512, RunningAvgSamplesPerSec=6.340534225994082, CurrSamplesPerSec=5.662470382483525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:09:50,688] [INFO] [timer.py:197:stop] 0/9514, RunningAvgSamplesPerSec=6.340533705118219, CurrSamplesPerSec=5.699211404662409, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:10:01,949] [INFO] [timer.py:197:stop] 0/9516, RunningAvgSamplesPerSec=6.340536044772183, CurrSamplesPerSec=5.72302137321579, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:10:13,239] [INFO] [timer.py:197:stop] 0/9518, RunningAvgSamplesPerSec=6.340536540174654, CurrSamplesPerSec=5.723779914847205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:10:24,570] [INFO] [logging.py:68:log_dist] [Rank 0] step=4760, skipped=5, lr=[5.466666666666667e-07], mom=[[0.9, 0.999]] [2022-12-17 13:10:24,572] [INFO] [timer.py:197:stop] 0/9520, RunningAvgSamplesPerSec=6.3405322503302814, CurrSamplesPerSec=5.682484792575845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:10:35,875] [INFO] [timer.py:197:stop] 0/9522, RunningAvgSamplesPerSec=6.3405315251666785, CurrSamplesPerSec=5.702043755493652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:10:47,179] [INFO] [timer.py:197:stop] 0/9524, RunningAvgSamplesPerSec=6.340530874797334, CurrSamplesPerSec=5.703819707289909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:10:58,484] [INFO] [timer.py:197:stop] 0/9526, RunningAvgSamplesPerSec=6.340529822737708, CurrSamplesPerSec=5.689387582292352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:11:09,804] [INFO] [timer.py:197:stop] 0/9528, RunningAvgSamplesPerSec=6.340528232367791, CurrSamplesPerSec=5.692070885675692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:11:21,133] [INFO] [timer.py:197:stop] 0/9530, RunningAvgSamplesPerSec=6.340525136810104, CurrSamplesPerSec=5.682145108321376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:11:32,444] [INFO] [timer.py:197:stop] 0/9532, RunningAvgSamplesPerSec=6.340524288841033, CurrSamplesPerSec=5.710917824109433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:11:43,753] [INFO] [timer.py:197:stop] 0/9534, RunningAvgSamplesPerSec=6.3405241001501285, CurrSamplesPerSec=5.6932148514914065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:11:55,061] [INFO] [timer.py:197:stop] 0/9536, RunningAvgSamplesPerSec=6.340523159885818, CurrSamplesPerSec=5.70686944041714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:12:06,347] [INFO] [timer.py:197:stop] 0/9538, RunningAvgSamplesPerSec=6.340524138848755, CurrSamplesPerSec=5.707079100468829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:12:17,633] [INFO] [logging.py:68:log_dist] [Rank 0] step=4770, skipped=5, lr=[5.244444444444445e-07], mom=[[0.9, 0.999]] [2022-12-17 13:12:17,634] [INFO] [timer.py:197:stop] 0/9540, RunningAvgSamplesPerSec=6.340528219980291, CurrSamplesPerSec=5.720528974189757, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:12:28,913] [INFO] [timer.py:197:stop] 0/9542, RunningAvgSamplesPerSec=6.340531213151512, CurrSamplesPerSec=5.716052358086121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:12:40,206] [INFO] [timer.py:197:stop] 0/9544, RunningAvgSamplesPerSec=6.340532220953055, CurrSamplesPerSec=5.7087142007328815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:12:51,456] [INFO] [timer.py:197:stop] 0/9546, RunningAvgSamplesPerSec=6.340539098378529, CurrSamplesPerSec=5.724794302509103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:13:02,844] [INFO] [timer.py:197:stop] 0/9548, RunningAvgSamplesPerSec=6.340537904218002, CurrSamplesPerSec=5.713264443529819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:13:14,112] [INFO] [timer.py:197:stop] 0/9550, RunningAvgSamplesPerSec=6.340542398990786, CurrSamplesPerSec=5.715218229052894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 5.133333333333334e-07, 'epoch': 20.23} [2022-12-17 13:13:25,382] [INFO] [timer.py:197:stop] 0/9552, RunningAvgSamplesPerSec=6.3405453106084675, CurrSamplesPerSec=5.710071103117879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:13:36,655] [INFO] [timer.py:197:stop] 0/9554, RunningAvgSamplesPerSec=6.3405490647928, CurrSamplesPerSec=5.730727178909781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:13:47,927] [INFO] [timer.py:197:stop] 0/9556, RunningAvgSamplesPerSec=6.340553842919432, CurrSamplesPerSec=5.7252531532934965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:13:59,181] [INFO] [timer.py:197:stop] 0/9558, RunningAvgSamplesPerSec=6.340556461041022, CurrSamplesPerSec=5.717279532461573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:14:10,382] [INFO] [logging.py:68:log_dist] [Rank 0] step=4780, skipped=5, lr=[5.022222222222222e-07], mom=[[0.9, 0.999]] [2022-12-17 13:14:10,383] [INFO] [timer.py:197:stop] 0/9560, RunningAvgSamplesPerSec=6.340566945139066, CurrSamplesPerSec=5.766731215163039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:14:21,647] [INFO] [timer.py:197:stop] 0/9562, RunningAvgSamplesPerSec=6.340572386150829, CurrSamplesPerSec=5.734300356731472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:14:32,907] [INFO] [timer.py:197:stop] 0/9564, RunningAvgSamplesPerSec=6.340577394733058, CurrSamplesPerSec=5.722657793597098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:14:44,137] [INFO] [timer.py:197:stop] 0/9566, RunningAvgSamplesPerSec=6.34058489836147, CurrSamplesPerSec=5.747707676283041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:14:55,419] [INFO] [timer.py:197:stop] 0/9568, RunningAvgSamplesPerSec=6.340586714162821, CurrSamplesPerSec=5.719065963822951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:15:06,726] [INFO] [timer.py:197:stop] 0/9570, RunningAvgSamplesPerSec=6.340587735441634, CurrSamplesPerSec=5.710436486486199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:15:18,005] [INFO] [timer.py:197:stop] 0/9572, RunningAvgSamplesPerSec=6.340588577229494, CurrSamplesPerSec=5.709074796483692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:15:29,241] [INFO] [timer.py:197:stop] 0/9574, RunningAvgSamplesPerSec=6.340596598155912, CurrSamplesPerSec=5.740768560004644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:15:40,507] [INFO] [timer.py:197:stop] 0/9576, RunningAvgSamplesPerSec=6.3406009572890945, CurrSamplesPerSec=5.72478258189847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:15:51,771] [INFO] [timer.py:197:stop] 0/9578, RunningAvgSamplesPerSec=6.340606231269369, CurrSamplesPerSec=5.709940412176869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:16:03,031] [INFO] [logging.py:68:log_dist] [Rank 0] step=4790, skipped=5, lr=[4.800000000000001e-07], mom=[[0.9, 0.999]] [2022-12-17 13:16:03,033] [INFO] [timer.py:197:stop] 0/9580, RunningAvgSamplesPerSec=6.340611041332799, CurrSamplesPerSec=5.738459406638726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:16:14,353] [INFO] [timer.py:197:stop] 0/9582, RunningAvgSamplesPerSec=6.340610755927412, CurrSamplesPerSec=5.708338355468865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:16:25,645] [INFO] [timer.py:197:stop] 0/9584, RunningAvgSamplesPerSec=6.340613416570764, CurrSamplesPerSec=5.708610765511207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:16:36,934] [INFO] [timer.py:197:stop] 0/9586, RunningAvgSamplesPerSec=6.340618523642324, CurrSamplesPerSec=5.722949629686229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:16:48,199] [INFO] [timer.py:197:stop] 0/9588, RunningAvgSamplesPerSec=6.3406237510816394, CurrSamplesPerSec=5.724449785421197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:16:59,446] [INFO] [timer.py:197:stop] 0/9590, RunningAvgSamplesPerSec=6.340630585642387, CurrSamplesPerSec=5.754374182993225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:17:10,803] [INFO] [timer.py:197:stop] 0/9592, RunningAvgSamplesPerSec=6.340623210045512, CurrSamplesPerSec=5.702429432409602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:17:22,070] [INFO] [timer.py:197:stop] 0/9594, RunningAvgSamplesPerSec=6.3406282526534286, CurrSamplesPerSec=5.7228803281207705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:17:33,367] [INFO] [timer.py:197:stop] 0/9596, RunningAvgSamplesPerSec=6.340630603611362, CurrSamplesPerSec=5.723601488161577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:17:44,622] [INFO] [timer.py:197:stop] 0/9598, RunningAvgSamplesPerSec=6.340631843955927, CurrSamplesPerSec=5.724446123166047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:17:55,858] [INFO] [logging.py:68:log_dist] [Rank 0] step=4800, skipped=5, lr=[4.5777777777777784e-07], mom=[[0.9, 0.999]] [2022-12-17 13:17:55,859] [INFO] [timer.py:197:stop] 0/9600, RunningAvgSamplesPerSec=6.340637832959923, CurrSamplesPerSec=5.731995668516094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 4.5777777777777784e-07, 'epoch': 20.34} [2022-12-17 13:18:07,153] [INFO] [timer.py:197:stop] 0/9602, RunningAvgSamplesPerSec=6.340640425084899, CurrSamplesPerSec=5.713372181970451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:18:18,399] [INFO] [timer.py:197:stop] 0/9604, RunningAvgSamplesPerSec=6.34064592644466, CurrSamplesPerSec=5.708614650334416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:18:29,611] [INFO] [timer.py:197:stop] 0/9606, RunningAvgSamplesPerSec=6.340654324601269, CurrSamplesPerSec=5.739666031481032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:18:40,908] [INFO] [timer.py:197:stop] 0/9608, RunningAvgSamplesPerSec=6.340654018292477, CurrSamplesPerSec=5.700855559061133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:18:52,175] [INFO] [timer.py:197:stop] 0/9610, RunningAvgSamplesPerSec=6.34065773956752, CurrSamplesPerSec=5.725173538996036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:19:03,445] [INFO] [timer.py:197:stop] 0/9612, RunningAvgSamplesPerSec=6.34066179110196, CurrSamplesPerSec=5.728430251233179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:19:14,699] [INFO] [timer.py:197:stop] 0/9614, RunningAvgSamplesPerSec=6.340667541111354, CurrSamplesPerSec=5.733595602853228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:19:25,955] [INFO] [timer.py:197:stop] 0/9616, RunningAvgSamplesPerSec=6.340673556128894, CurrSamplesPerSec=5.7353750907785255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:19:37,225] [INFO] [timer.py:197:stop] 0/9618, RunningAvgSamplesPerSec=6.340677615136063, CurrSamplesPerSec=5.725240942367918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:19:48,470] [INFO] [logging.py:68:log_dist] [Rank 0] step=4810, skipped=5, lr=[4.355555555555556e-07], mom=[[0.9, 0.999]] [2022-12-17 13:19:48,472] [INFO] [timer.py:197:stop] 0/9620, RunningAvgSamplesPerSec=6.340680049567392, CurrSamplesPerSec=5.722354764544266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:19:59,793] [INFO] [timer.py:197:stop] 0/9622, RunningAvgSamplesPerSec=6.340678324203684, CurrSamplesPerSec=5.682935923325895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:20:11,140] [INFO] [timer.py:197:stop] 0/9624, RunningAvgSamplesPerSec=6.340671645715207, CurrSamplesPerSec=5.643411426373468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:20:22,407] [INFO] [timer.py:197:stop] 0/9626, RunningAvgSamplesPerSec=6.340675981803199, CurrSamplesPerSec=5.716829263846316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:20:33,636] [INFO] [timer.py:197:stop] 0/9628, RunningAvgSamplesPerSec=6.340684535616191, CurrSamplesPerSec=5.735601801817971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:20:44,887] [INFO] [timer.py:197:stop] 0/9630, RunningAvgSamplesPerSec=6.340690407688014, CurrSamplesPerSec=5.740077680962519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:20:56,211] [INFO] [timer.py:197:stop] 0/9632, RunningAvgSamplesPerSec=6.340686808726378, CurrSamplesPerSec=5.70311176257181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:21:07,481] [INFO] [timer.py:197:stop] 0/9634, RunningAvgSamplesPerSec=6.340688182776883, CurrSamplesPerSec=5.714741032735308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:21:18,788] [INFO] [timer.py:197:stop] 0/9636, RunningAvgSamplesPerSec=6.340689150997104, CurrSamplesPerSec=5.701154862332323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:21:30,068] [INFO] [timer.py:197:stop] 0/9638, RunningAvgSamplesPerSec=6.340689748082774, CurrSamplesPerSec=5.7023276784080865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:21:41,358] [INFO] [logging.py:68:log_dist] [Rank 0] step=4820, skipped=5, lr=[4.133333333333334e-07], mom=[[0.9, 0.999]] [2022-12-17 13:21:41,360] [INFO] [timer.py:197:stop] 0/9640, RunningAvgSamplesPerSec=6.340691169588604, CurrSamplesPerSec=5.722409902708005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:21:52,645] [INFO] [timer.py:197:stop] 0/9642, RunningAvgSamplesPerSec=6.340693182885456, CurrSamplesPerSec=5.718977748871354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:22:03,898] [INFO] [timer.py:197:stop] 0/9644, RunningAvgSamplesPerSec=6.340695317813755, CurrSamplesPerSec=5.735734160332338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:22:15,172] [INFO] [timer.py:197:stop] 0/9646, RunningAvgSamplesPerSec=6.340698505054053, CurrSamplesPerSec=5.725852529541198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:22:26,496] [INFO] [timer.py:197:stop] 0/9648, RunningAvgSamplesPerSec=6.340696415651516, CurrSamplesPerSec=5.709592579939372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:22:37,825] [INFO] [timer.py:197:stop] 0/9650, RunningAvgSamplesPerSec=6.340692969068761, CurrSamplesPerSec=5.689682788244413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 4.0222222222222224e-07, 'epoch': 20.44} [2022-12-17 13:22:49,132] [INFO] [timer.py:197:stop] 0/9652, RunningAvgSamplesPerSec=6.3406922739664155, CurrSamplesPerSec=5.686778127201508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:23:00,392] [INFO] [timer.py:197:stop] 0/9654, RunningAvgSamplesPerSec=6.340693233482764, CurrSamplesPerSec=5.707892164175741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:23:11,681] [INFO] [timer.py:197:stop] 0/9656, RunningAvgSamplesPerSec=6.340694915186403, CurrSamplesPerSec=5.702618898345167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:23:22,958] [INFO] [timer.py:197:stop] 0/9658, RunningAvgSamplesPerSec=6.340696549072195, CurrSamplesPerSec=5.713376073276843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:23:34,305] [INFO] [logging.py:68:log_dist] [Rank 0] step=4830, skipped=5, lr=[3.9111111111111115e-07], mom=[[0.9, 0.999]] [2022-12-17 13:23:34,307] [INFO] [timer.py:197:stop] 0/9660, RunningAvgSamplesPerSec=6.34068979430831, CurrSamplesPerSec=5.641418446918364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:23:45,571] [INFO] [timer.py:197:stop] 0/9662, RunningAvgSamplesPerSec=6.340692254869143, CurrSamplesPerSec=5.731172297788931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:23:56,829] [INFO] [timer.py:197:stop] 0/9664, RunningAvgSamplesPerSec=6.340697391418571, CurrSamplesPerSec=5.741566199321276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:24:08,156] [INFO] [timer.py:197:stop] 0/9666, RunningAvgSamplesPerSec=6.340694087279765, CurrSamplesPerSec=5.677179821010574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:24:19,489] [INFO] [timer.py:197:stop] 0/9668, RunningAvgSamplesPerSec=6.34068814444375, CurrSamplesPerSec=5.676951461835517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:24:30,789] [INFO] [timer.py:197:stop] 0/9670, RunningAvgSamplesPerSec=6.340686576979262, CurrSamplesPerSec=5.686611637253284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:24:42,123] [INFO] [timer.py:197:stop] 0/9672, RunningAvgSamplesPerSec=6.340682922114108, CurrSamplesPerSec=5.673882282916007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:24:53,463] [INFO] [timer.py:197:stop] 0/9674, RunningAvgSamplesPerSec=6.340678183937235, CurrSamplesPerSec=5.659441894190174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:25:04,793] [INFO] [timer.py:197:stop] 0/9676, RunningAvgSamplesPerSec=6.340674849212192, CurrSamplesPerSec=5.671807087681444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:25:16,103] [INFO] [timer.py:197:stop] 0/9678, RunningAvgSamplesPerSec=6.340673436963481, CurrSamplesPerSec=5.701336493716402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:25:27,468] [INFO] [logging.py:68:log_dist] [Rank 0] step=4840, skipped=5, lr=[3.6888888888888893e-07], mom=[[0.9, 0.999]] [2022-12-17 13:25:27,469] [INFO] [timer.py:197:stop] 0/9680, RunningAvgSamplesPerSec=6.340664750330204, CurrSamplesPerSec=5.628685730796901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:25:38,774] [INFO] [timer.py:197:stop] 0/9682, RunningAvgSamplesPerSec=6.340664072347203, CurrSamplesPerSec=5.694476934220253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:25:50,093] [INFO] [timer.py:197:stop] 0/9684, RunningAvgSamplesPerSec=6.340661364452313, CurrSamplesPerSec=5.674273035404013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:26:01,484] [INFO] [timer.py:197:stop] 0/9686, RunningAvgSamplesPerSec=6.3406516390929175, CurrSamplesPerSec=5.6439826334632945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:26:12,815] [INFO] [timer.py:197:stop] 0/9688, RunningAvgSamplesPerSec=6.340648878300982, CurrSamplesPerSec=5.679193105947094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:26:24,108] [INFO] [timer.py:197:stop] 0/9690, RunningAvgSamplesPerSec=6.34064983052059, CurrSamplesPerSec=5.707535843863806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:26:35,408] [INFO] [timer.py:197:stop] 0/9692, RunningAvgSamplesPerSec=6.340648029931281, CurrSamplesPerSec=5.687791733628178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:26:46,714] [INFO] [timer.py:197:stop] 0/9694, RunningAvgSamplesPerSec=6.340645924950521, CurrSamplesPerSec=5.679688899105148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:26:58,001] [INFO] [timer.py:197:stop] 0/9696, RunningAvgSamplesPerSec=6.340645967687609, CurrSamplesPerSec=5.679108519733628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:27:09,299] [INFO] [timer.py:197:stop] 0/9698, RunningAvgSamplesPerSec=6.3406472989057505, CurrSamplesPerSec=5.712760339986075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:27:20,604] [INFO] [logging.py:68:log_dist] [Rank 0] step=4850, skipped=5, lr=[3.466666666666667e-07], mom=[[0.9, 0.999]] [2022-12-17 13:27:20,606] [INFO] [timer.py:197:stop] 0/9700, RunningAvgSamplesPerSec=6.340648066938377, CurrSamplesPerSec=5.688963156833746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 3.466666666666667e-07, 'epoch': 20.55} [2022-12-17 13:27:31,925] [INFO] [timer.py:197:stop] 0/9702, RunningAvgSamplesPerSec=6.340644614174266, CurrSamplesPerSec=5.69075485566726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:27:43,348] [INFO] [timer.py:197:stop] 0/9704, RunningAvgSamplesPerSec=6.340641727402441, CurrSamplesPerSec=5.684615240287061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:27:54,685] [INFO] [timer.py:197:stop] 0/9706, RunningAvgSamplesPerSec=6.340637493457199, CurrSamplesPerSec=5.689613084076415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:28:05,987] [INFO] [timer.py:197:stop] 0/9708, RunningAvgSamplesPerSec=6.3406360614694695, CurrSamplesPerSec=5.70824682958766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:28:17,259] [INFO] [timer.py:197:stop] 0/9710, RunningAvgSamplesPerSec=6.340639036202918, CurrSamplesPerSec=5.72194516517078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:28:28,566] [INFO] [timer.py:197:stop] 0/9712, RunningAvgSamplesPerSec=6.340639594011932, CurrSamplesPerSec=5.708637959384705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:28:39,868] [INFO] [timer.py:197:stop] 0/9714, RunningAvgSamplesPerSec=6.340638689416856, CurrSamplesPerSec=5.715024031439892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:28:51,161] [INFO] [timer.py:197:stop] 0/9716, RunningAvgSamplesPerSec=6.340639113239258, CurrSamplesPerSec=5.6984850060904595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:29:02,418] [INFO] [timer.py:197:stop] 0/9718, RunningAvgSamplesPerSec=6.340642093499504, CurrSamplesPerSec=5.738537673397883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:29:13,689] [INFO] [logging.py:68:log_dist] [Rank 0] step=4860, skipped=5, lr=[3.2444444444444447e-07], mom=[[0.9, 0.999]] [2022-12-17 13:29:13,690] [INFO] [timer.py:197:stop] 0/9720, RunningAvgSamplesPerSec=6.340644162013655, CurrSamplesPerSec=5.708898742362166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:29:25,006] [INFO] [timer.py:197:stop] 0/9722, RunningAvgSamplesPerSec=6.340642235669002, CurrSamplesPerSec=5.686858604853792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:29:36,299] [INFO] [timer.py:197:stop] 0/9724, RunningAvgSamplesPerSec=6.340642705245801, CurrSamplesPerSec=5.6947743599454475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:29:47,574] [INFO] [timer.py:197:stop] 0/9726, RunningAvgSamplesPerSec=6.340645698121382, CurrSamplesPerSec=5.712487533780942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:29:58,862] [INFO] [timer.py:197:stop] 0/9728, RunningAvgSamplesPerSec=6.340647725574473, CurrSamplesPerSec=5.708532098978827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:30:10,148] [INFO] [timer.py:197:stop] 0/9730, RunningAvgSamplesPerSec=6.340650054631584, CurrSamplesPerSec=5.690818555554048, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:30:21,405] [INFO] [timer.py:197:stop] 0/9732, RunningAvgSamplesPerSec=6.340652812405148, CurrSamplesPerSec=5.715923340598525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:30:32,703] [INFO] [timer.py:197:stop] 0/9734, RunningAvgSamplesPerSec=6.340653639196751, CurrSamplesPerSec=5.699098875774059, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:30:43,977] [INFO] [timer.py:197:stop] 0/9736, RunningAvgSamplesPerSec=6.340657923589717, CurrSamplesPerSec=5.703578535551888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:30:55,247] [INFO] [timer.py:197:stop] 0/9738, RunningAvgSamplesPerSec=6.3406613332583, CurrSamplesPerSec=5.716524660457944, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:31:06,551] [INFO] [logging.py:68:log_dist] [Rank 0] step=4870, skipped=5, lr=[3.0222222222222225e-07], mom=[[0.9, 0.999]] [2022-12-17 13:31:06,553] [INFO] [timer.py:197:stop] 0/9740, RunningAvgSamplesPerSec=6.340660083477266, CurrSamplesPerSec=5.70610736854931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:31:17,824] [INFO] [timer.py:197:stop] 0/9742, RunningAvgSamplesPerSec=6.340663747945913, CurrSamplesPerSec=5.729440418168294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:31:29,111] [INFO] [timer.py:197:stop] 0/9744, RunningAvgSamplesPerSec=6.340663146217899, CurrSamplesPerSec=5.710007943180077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:31:40,451] [INFO] [timer.py:197:stop] 0/9746, RunningAvgSamplesPerSec=6.3406584805563115, CurrSamplesPerSec=5.691350167412572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:31:51,768] [INFO] [timer.py:197:stop] 0/9748, RunningAvgSamplesPerSec=6.340656735987837, CurrSamplesPerSec=5.690681747327503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:32:03,069] [INFO] [timer.py:197:stop] 0/9750, RunningAvgSamplesPerSec=6.340654303692601, CurrSamplesPerSec=5.686689218910566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.9111111111111116e-07, 'epoch': 20.66} [2022-12-17 13:32:14,345] [INFO] [timer.py:197:stop] 0/9752, RunningAvgSamplesPerSec=6.340657376788333, CurrSamplesPerSec=5.725202844614887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:32:25,650] [INFO] [timer.py:197:stop] 0/9754, RunningAvgSamplesPerSec=6.340656556812783, CurrSamplesPerSec=5.690480528132339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:32:36,961] [INFO] [timer.py:197:stop] 0/9756, RunningAvgSamplesPerSec=6.340655173128515, CurrSamplesPerSec=5.7164781571213945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:32:48,279] [INFO] [timer.py:197:stop] 0/9758, RunningAvgSamplesPerSec=6.340652418983041, CurrSamplesPerSec=5.6877141218856835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:32:59,606] [INFO] [logging.py:68:log_dist] [Rank 0] step=4880, skipped=5, lr=[2.8e-07], mom=[[0.9, 0.999]] [2022-12-17 13:32:59,608] [INFO] [timer.py:197:stop] 0/9760, RunningAvgSamplesPerSec=6.340648595255576, CurrSamplesPerSec=5.672382619376683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:33:10,911] [INFO] [timer.py:197:stop] 0/9762, RunningAvgSamplesPerSec=6.34064813649814, CurrSamplesPerSec=5.704121018830548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:33:22,203] [INFO] [timer.py:197:stop] 0/9764, RunningAvgSamplesPerSec=6.340646665771141, CurrSamplesPerSec=5.699006194026974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:33:33,517] [INFO] [timer.py:197:stop] 0/9766, RunningAvgSamplesPerSec=6.3406455096689545, CurrSamplesPerSec=5.711425005086199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:33:44,847] [INFO] [timer.py:197:stop] 0/9768, RunningAvgSamplesPerSec=6.340641344518732, CurrSamplesPerSec=5.690235899501322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:33:56,142] [INFO] [timer.py:197:stop] 0/9770, RunningAvgSamplesPerSec=6.340640091729031, CurrSamplesPerSec=5.7135424315821295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:34:07,454] [INFO] [timer.py:197:stop] 0/9772, RunningAvgSamplesPerSec=6.340638459983171, CurrSamplesPerSec=5.700314422978219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:34:18,735] [INFO] [timer.py:197:stop] 0/9774, RunningAvgSamplesPerSec=6.3406391161081554, CurrSamplesPerSec=5.708320147168445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:34:30,037] [INFO] [timer.py:197:stop] 0/9776, RunningAvgSamplesPerSec=6.3406391673357545, CurrSamplesPerSec=5.71570937915154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:34:41,318] [INFO] [timer.py:197:stop] 0/9778, RunningAvgSamplesPerSec=6.340641663527955, CurrSamplesPerSec=5.721050056795874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:34:52,627] [INFO] [logging.py:68:log_dist] [Rank 0] step=4890, skipped=5, lr=[2.577777777777778e-07], mom=[[0.9, 0.999]] [2022-12-17 13:34:52,628] [INFO] [timer.py:197:stop] 0/9780, RunningAvgSamplesPerSec=6.3406400030490335, CurrSamplesPerSec=5.697978669195069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:35:03,925] [INFO] [timer.py:197:stop] 0/9782, RunningAvgSamplesPerSec=6.34064038120064, CurrSamplesPerSec=5.702911844245921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:35:15,185] [INFO] [timer.py:197:stop] 0/9784, RunningAvgSamplesPerSec=6.340640695137145, CurrSamplesPerSec=5.707185149671572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:35:26,524] [INFO] [timer.py:197:stop] 0/9786, RunningAvgSamplesPerSec=6.340635597279186, CurrSamplesPerSec=5.680664876745854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:35:37,820] [INFO] [timer.py:197:stop] 0/9788, RunningAvgSamplesPerSec=6.340635829083338, CurrSamplesPerSec=5.7053639320357314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:35:49,138] [INFO] [timer.py:197:stop] 0/9790, RunningAvgSamplesPerSec=6.3406354978534845, CurrSamplesPerSec=5.703496372150302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:36:00,446] [INFO] [timer.py:197:stop] 0/9792, RunningAvgSamplesPerSec=6.340636871356399, CurrSamplesPerSec=5.712987455221542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:36:11,795] [INFO] [timer.py:197:stop] 0/9794, RunningAvgSamplesPerSec=6.340633147060947, CurrSamplesPerSec=5.656989532874886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:36:23,176] [INFO] [timer.py:197:stop] 0/9796, RunningAvgSamplesPerSec=6.340631705921807, CurrSamplesPerSec=5.691418948785352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:36:34,507] [INFO] [timer.py:197:stop] 0/9798, RunningAvgSamplesPerSec=6.340626935726591, CurrSamplesPerSec=5.682324568171797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:36:45,815] [INFO] [logging.py:68:log_dist] [Rank 0] step=4900, skipped=5, lr=[2.3555555555555556e-07], mom=[[0.9, 0.999]] [2022-12-17 13:36:45,816] [INFO] [timer.py:197:stop] 0/9800, RunningAvgSamplesPerSec=6.340622782064439, CurrSamplesPerSec=5.689119174059178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.3555555555555556e-07, 'epoch': 20.76} [2022-12-17 13:36:57,143] [INFO] [timer.py:197:stop] 0/9802, RunningAvgSamplesPerSec=6.340620265372528, CurrSamplesPerSec=5.685125465837036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:37:08,478] [INFO] [timer.py:197:stop] 0/9804, RunningAvgSamplesPerSec=6.34061955082465, CurrSamplesPerSec=5.71378006836133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:37:19,836] [INFO] [timer.py:197:stop] 0/9806, RunningAvgSamplesPerSec=6.3406180901346625, CurrSamplesPerSec=5.7119497782507125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:37:31,216] [INFO] [timer.py:197:stop] 0/9808, RunningAvgSamplesPerSec=6.340614073759756, CurrSamplesPerSec=5.673192061048671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:37:42,606] [INFO] [timer.py:197:stop] 0/9810, RunningAvgSamplesPerSec=6.340608655859164, CurrSamplesPerSec=5.669230971978268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:37:53,967] [INFO] [timer.py:197:stop] 0/9812, RunningAvgSamplesPerSec=6.340606529412413, CurrSamplesPerSec=5.6976468048680005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:38:05,336] [INFO] [timer.py:197:stop] 0/9814, RunningAvgSamplesPerSec=6.3406031791855275, CurrSamplesPerSec=5.703918120913467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:38:16,681] [INFO] [timer.py:197:stop] 0/9816, RunningAvgSamplesPerSec=6.340602157159592, CurrSamplesPerSec=5.713201456120576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:38:28,014] [INFO] [timer.py:197:stop] 0/9818, RunningAvgSamplesPerSec=6.340599736519701, CurrSamplesPerSec=5.718833247938542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:38:39,341] [INFO] [logging.py:68:log_dist] [Rank 0] step=4910, skipped=5, lr=[2.1333333333333334e-07], mom=[[0.9, 0.999]] [2022-12-17 13:38:39,343] [INFO] [timer.py:197:stop] 0/9820, RunningAvgSamplesPerSec=6.340597847266216, CurrSamplesPerSec=5.714028671694887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:38:50,720] [INFO] [timer.py:197:stop] 0/9822, RunningAvgSamplesPerSec=6.340593712928795, CurrSamplesPerSec=5.680739170553177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:39:02,079] [INFO] [timer.py:197:stop] 0/9824, RunningAvgSamplesPerSec=6.340594575940506, CurrSamplesPerSec=5.711247590859075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:39:13,423] [INFO] [timer.py:197:stop] 0/9826, RunningAvgSamplesPerSec=6.340595406220354, CurrSamplesPerSec=5.713545836677856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:39:24,773] [INFO] [timer.py:197:stop] 0/9828, RunningAvgSamplesPerSec=6.340595171840303, CurrSamplesPerSec=5.713653099270805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:39:36,105] [INFO] [timer.py:197:stop] 0/9830, RunningAvgSamplesPerSec=6.340596883021344, CurrSamplesPerSec=5.7223742823568395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:39:47,522] [INFO] [timer.py:197:stop] 0/9832, RunningAvgSamplesPerSec=6.340594647328931, CurrSamplesPerSec=5.67701365247189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:39:58,843] [INFO] [timer.py:197:stop] 0/9834, RunningAvgSamplesPerSec=6.340595410168788, CurrSamplesPerSec=5.699328294212952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:40:10,152] [INFO] [timer.py:197:stop] 0/9836, RunningAvgSamplesPerSec=6.3405977092798675, CurrSamplesPerSec=5.733189045775647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:40:21,517] [INFO] [timer.py:197:stop] 0/9838, RunningAvgSamplesPerSec=6.34059512603362, CurrSamplesPerSec=5.709437866507493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:40:32,842] [INFO] [logging.py:68:log_dist] [Rank 0] step=4920, skipped=5, lr=[1.911111111111111e-07], mom=[[0.9, 0.999]] [2022-12-17 13:40:32,844] [INFO] [timer.py:197:stop] 0/9840, RunningAvgSamplesPerSec=6.3405952053865615, CurrSamplesPerSec=5.710939936974358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:40:44,186] [INFO] [timer.py:197:stop] 0/9842, RunningAvgSamplesPerSec=6.340596580447692, CurrSamplesPerSec=5.737068135868828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:40:55,521] [INFO] [timer.py:197:stop] 0/9844, RunningAvgSamplesPerSec=6.340598206431438, CurrSamplesPerSec=5.705499021755187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:41:06,857] [INFO] [timer.py:197:stop] 0/9846, RunningAvgSamplesPerSec=6.340600054430399, CurrSamplesPerSec=5.7160389692080775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:41:18,208] [INFO] [timer.py:197:stop] 0/9848, RunningAvgSamplesPerSec=6.340598016578841, CurrSamplesPerSec=5.6954347996193135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:41:29,554] [INFO] [timer.py:197:stop] 0/9850, RunningAvgSamplesPerSec=6.340597950058121, CurrSamplesPerSec=5.6924349346684275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.8e-07, 'epoch': 20.87} [2022-12-17 13:41:40,896] [INFO] [timer.py:197:stop] 0/9852, RunningAvgSamplesPerSec=6.340596690920647, CurrSamplesPerSec=5.671698993757138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:41:52,145] [INFO] [timer.py:197:stop] 0/9854, RunningAvgSamplesPerSec=6.3406011234994395, CurrSamplesPerSec=5.714989233011362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:42:03,433] [INFO] [timer.py:197:stop] 0/9856, RunningAvgSamplesPerSec=6.340602982802754, CurrSamplesPerSec=5.727709096073513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:42:14,717] [INFO] [timer.py:197:stop] 0/9858, RunningAvgSamplesPerSec=6.340604893504226, CurrSamplesPerSec=5.727854779225058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:42:26,001] [INFO] [logging.py:68:log_dist] [Rank 0] step=4930, skipped=5, lr=[1.6888888888888888e-07], mom=[[0.9, 0.999]] [2022-12-17 13:42:26,003] [INFO] [timer.py:197:stop] 0/9860, RunningAvgSamplesPerSec=6.3406086560517485, CurrSamplesPerSec=5.7084192017256035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:42:37,376] [INFO] [timer.py:197:stop] 0/9862, RunningAvgSamplesPerSec=6.340601733325555, CurrSamplesPerSec=5.652896269703396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:42:48,699] [INFO] [timer.py:197:stop] 0/9864, RunningAvgSamplesPerSec=6.340598766994051, CurrSamplesPerSec=5.678477086373248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:43:00,014] [INFO] [timer.py:197:stop] 0/9866, RunningAvgSamplesPerSec=6.340596977908883, CurrSamplesPerSec=5.697784915718732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:43:11,313] [INFO] [timer.py:197:stop] 0/9868, RunningAvgSamplesPerSec=6.34059902107909, CurrSamplesPerSec=5.710696218920705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:43:22,574] [INFO] [timer.py:197:stop] 0/9870, RunningAvgSamplesPerSec=6.340604113495977, CurrSamplesPerSec=5.732323956868721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:43:33,938] [INFO] [timer.py:197:stop] 0/9872, RunningAvgSamplesPerSec=6.340597750596586, CurrSamplesPerSec=5.650902772471108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:43:45,269] [INFO] [timer.py:197:stop] 0/9874, RunningAvgSamplesPerSec=6.340595945296254, CurrSamplesPerSec=5.690554355105167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:43:56,585] [INFO] [timer.py:197:stop] 0/9876, RunningAvgSamplesPerSec=6.3405946051103, CurrSamplesPerSec=5.69468689277867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:44:07,900] [INFO] [timer.py:197:stop] 0/9878, RunningAvgSamplesPerSec=6.340593489179957, CurrSamplesPerSec=5.694379087707014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:44:19,150] [INFO] [logging.py:68:log_dist] [Rank 0] step=4940, skipped=5, lr=[1.4666666666666668e-07], mom=[[0.9, 0.999]] [2022-12-17 13:44:19,151] [INFO] [timer.py:197:stop] 0/9880, RunningAvgSamplesPerSec=6.340597604830494, CurrSamplesPerSec=5.716830481351049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:44:30,454] [INFO] [timer.py:197:stop] 0/9882, RunningAvgSamplesPerSec=6.340596721036361, CurrSamplesPerSec=5.6957004202676496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:44:41,769] [INFO] [timer.py:197:stop] 0/9884, RunningAvgSamplesPerSec=6.340594316788758, CurrSamplesPerSec=5.682323846460689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:44:53,052] [INFO] [timer.py:197:stop] 0/9886, RunningAvgSamplesPerSec=6.340596276162941, CurrSamplesPerSec=5.7212739295736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:45:04,379] [INFO] [timer.py:197:stop] 0/9888, RunningAvgSamplesPerSec=6.34059408455692, CurrSamplesPerSec=5.689828473058532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:45:15,650] [INFO] [timer.py:197:stop] 0/9890, RunningAvgSamplesPerSec=6.340597297284146, CurrSamplesPerSec=5.721333680717065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:45:26,950] [INFO] [timer.py:197:stop] 0/9892, RunningAvgSamplesPerSec=6.340595498450684, CurrSamplesPerSec=5.704597655422528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:45:38,258] [INFO] [timer.py:197:stop] 0/9894, RunningAvgSamplesPerSec=6.340593335221015, CurrSamplesPerSec=5.693914543849568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:45:49,551] [INFO] [timer.py:197:stop] 0/9896, RunningAvgSamplesPerSec=6.340594562498777, CurrSamplesPerSec=5.721465869313914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:46:00,863] [INFO] [timer.py:197:stop] 0/9898, RunningAvgSamplesPerSec=6.340592944529698, CurrSamplesPerSec=5.69145056464072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:46:12,155] [INFO] [logging.py:68:log_dist] [Rank 0] step=4950, skipped=5, lr=[1.2444444444444446e-07], mom=[[0.9, 0.999]] [2022-12-17 13:46:12,157] [INFO] [timer.py:197:stop] 0/9900, RunningAvgSamplesPerSec=6.340591871089729, CurrSamplesPerSec=5.701376454174018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.2444444444444446e-07, 'epoch': 20.97} [2022-12-17 13:46:23,459] [INFO] [timer.py:197:stop] 0/9902, RunningAvgSamplesPerSec=6.340590961739624, CurrSamplesPerSec=5.694413394002929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:46:34,742] [INFO] [timer.py:197:stop] 0/9904, RunningAvgSamplesPerSec=6.340592714523383, CurrSamplesPerSec=5.719328187867814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:46:46,073] [INFO] [timer.py:197:stop] 0/9906, RunningAvgSamplesPerSec=6.340586601472477, CurrSamplesPerSec=5.6506555873434285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:46:57,649] [INFO] [timer.py:197:stop] 0/9908, RunningAvgSamplesPerSec=6.340583139550562, CurrSamplesPerSec=5.688344477258061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:47:08,964] [INFO] [timer.py:197:stop] 0/9910, RunningAvgSamplesPerSec=6.340582063583655, CurrSamplesPerSec=5.69384062966408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:47:17,445] [INFO] [timer.py:197:stop] 0/9912, RunningAvgSamplesPerSec=6.340896979420159, CurrSamplesPerSec=10.21189842496566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:47:28,730] [INFO] [timer.py:197:stop] 0/9914, RunningAvgSamplesPerSec=6.3409001843020745, CurrSamplesPerSec=5.709035213714303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:47:40,043] [INFO] [timer.py:197:stop] 0/9916, RunningAvgSamplesPerSec=6.340900307056376, CurrSamplesPerSec=5.717296093178765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:47:51,324] [INFO] [timer.py:197:stop] 0/9918, RunningAvgSamplesPerSec=6.340903282977045, CurrSamplesPerSec=5.73046219594243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:48:02,669] [INFO] [logging.py:68:log_dist] [Rank 0] step=4960, skipped=5, lr=[1.0222222222222224e-07], mom=[[0.9, 0.999]] [2022-12-17 13:48:02,671] [INFO] [timer.py:197:stop] 0/9920, RunningAvgSamplesPerSec=6.340898328388526, CurrSamplesPerSec=5.660043561397479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:48:13,952] [INFO] [timer.py:197:stop] 0/9922, RunningAvgSamplesPerSec=6.340898483784882, CurrSamplesPerSec=5.702508899964778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:48:25,254] [INFO] [timer.py:197:stop] 0/9924, RunningAvgSamplesPerSec=6.340898390789046, CurrSamplesPerSec=5.700757735356679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:48:36,548] [INFO] [timer.py:197:stop] 0/9926, RunningAvgSamplesPerSec=6.340896976319998, CurrSamplesPerSec=5.693027458782009, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:48:47,837] [INFO] [timer.py:197:stop] 0/9928, RunningAvgSamplesPerSec=6.340898333657151, CurrSamplesPerSec=5.70214695287267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:48:59,116] [INFO] [timer.py:197:stop] 0/9930, RunningAvgSamplesPerSec=6.340901462526044, CurrSamplesPerSec=5.718285282932952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:49:10,404] [INFO] [timer.py:197:stop] 0/9932, RunningAvgSamplesPerSec=6.3409020831662275, CurrSamplesPerSec=5.71534161709944, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:49:21,711] [INFO] [timer.py:197:stop] 0/9934, RunningAvgSamplesPerSec=6.340902295046757, CurrSamplesPerSec=5.712443041122582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:49:32,997] [INFO] [timer.py:197:stop] 0/9936, RunningAvgSamplesPerSec=6.340904588484402, CurrSamplesPerSec=5.711998395744774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:49:44,255] [INFO] [timer.py:197:stop] 0/9938, RunningAvgSamplesPerSec=6.340910112012096, CurrSamplesPerSec=5.73411833383575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:49:55,502] [INFO] [logging.py:68:log_dist] [Rank 0] step=4970, skipped=5, lr=[8e-08], mom=[[0.9, 0.999]] [2022-12-17 13:49:55,503] [INFO] [timer.py:197:stop] 0/9940, RunningAvgSamplesPerSec=6.3409167332964005, CurrSamplesPerSec=5.74774976626621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:50:06,797] [INFO] [timer.py:197:stop] 0/9942, RunningAvgSamplesPerSec=6.340917796579706, CurrSamplesPerSec=5.698404682909532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:50:18,103] [INFO] [timer.py:197:stop] 0/9944, RunningAvgSamplesPerSec=6.34091793494103, CurrSamplesPerSec=5.704016295522513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:50:29,353] [INFO] [timer.py:197:stop] 0/9946, RunningAvgSamplesPerSec=6.340923511825759, CurrSamplesPerSec=5.728178681911909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:50:40,607] [INFO] [timer.py:197:stop] 0/9948, RunningAvgSamplesPerSec=6.340929224906553, CurrSamplesPerSec=5.731069760153105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:50:51,889] [INFO] [timer.py:197:stop] 0/9950, RunningAvgSamplesPerSec=6.340932002980195, CurrSamplesPerSec=5.713992669089812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 6.888888888888889e-08, 'epoch': 21.08} [2022-12-17 13:51:03,181] [INFO] [timer.py:197:stop] 0/9952, RunningAvgSamplesPerSec=6.340932461337423, CurrSamplesPerSec=5.707336586028797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:51:14,507] [INFO] [timer.py:197:stop] 0/9954, RunningAvgSamplesPerSec=6.340929661883157, CurrSamplesPerSec=5.705845627693179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:51:25,809] [INFO] [timer.py:197:stop] 0/9956, RunningAvgSamplesPerSec=6.340930317921198, CurrSamplesPerSec=5.708943422655863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:51:37,140] [INFO] [timer.py:197:stop] 0/9958, RunningAvgSamplesPerSec=6.340929734921853, CurrSamplesPerSec=5.696032299621633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:51:48,475] [INFO] [logging.py:68:log_dist] [Rank 0] step=4980, skipped=5, lr=[5.777777777777778e-08], mom=[[0.9, 0.999]] [2022-12-17 13:51:48,477] [INFO] [timer.py:197:stop] 0/9960, RunningAvgSamplesPerSec=6.340925350607312, CurrSamplesPerSec=5.688403060397394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:52:00,005] [INFO] [timer.py:197:stop] 0/9962, RunningAvgSamplesPerSec=6.3409241755697, CurrSamplesPerSec=5.708690162635773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:52:11,338] [INFO] [timer.py:197:stop] 0/9964, RunningAvgSamplesPerSec=6.340920354302299, CurrSamplesPerSec=5.6829395326607095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:52:22,674] [INFO] [timer.py:197:stop] 0/9966, RunningAvgSamplesPerSec=6.340916614696622, CurrSamplesPerSec=5.698118973321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:52:34,000] [INFO] [timer.py:197:stop] 0/9968, RunningAvgSamplesPerSec=6.340914119441085, CurrSamplesPerSec=5.70057202483779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:52:45,330] [INFO] [timer.py:197:stop] 0/9970, RunningAvgSamplesPerSec=6.340909146581982, CurrSamplesPerSec=5.697504105417921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:52:56,611] [INFO] [timer.py:197:stop] 0/9972, RunningAvgSamplesPerSec=6.340909995964699, CurrSamplesPerSec=5.709983651268353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:53:07,892] [INFO] [timer.py:197:stop] 0/9974, RunningAvgSamplesPerSec=6.34090860048114, CurrSamplesPerSec=5.708770776043156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:53:19,190] [INFO] [timer.py:197:stop] 0/9976, RunningAvgSamplesPerSec=6.340909069212601, CurrSamplesPerSec=5.708913797600395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:53:30,496] [INFO] [timer.py:197:stop] 0/9978, RunningAvgSamplesPerSec=6.340909470341716, CurrSamplesPerSec=5.7091665917667935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:53:41,802] [INFO] [logging.py:68:log_dist] [Rank 0] step=4990, skipped=5, lr=[3.555555555555556e-08], mom=[[0.9, 0.999]] [2022-12-17 13:53:41,804] [INFO] [timer.py:197:stop] 0/9980, RunningAvgSamplesPerSec=6.340909563247144, CurrSamplesPerSec=5.706446284646033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:53:53,132] [INFO] [timer.py:197:stop] 0/9982, RunningAvgSamplesPerSec=6.340906959207517, CurrSamplesPerSec=5.699439864073999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:54:04,416] [INFO] [timer.py:197:stop] 0/9984, RunningAvgSamplesPerSec=6.3409092895430135, CurrSamplesPerSec=5.729085559804991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:54:15,670] [INFO] [timer.py:197:stop] 0/9986, RunningAvgSamplesPerSec=6.34091422515384, CurrSamplesPerSec=5.727353230141937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:54:26,952] [INFO] [timer.py:197:stop] 0/9988, RunningAvgSamplesPerSec=6.340917488526028, CurrSamplesPerSec=5.715415360428148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:54:38,232] [INFO] [timer.py:197:stop] 0/9990, RunningAvgSamplesPerSec=6.340920287446701, CurrSamplesPerSec=5.71633670397908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:54:49,484] [INFO] [timer.py:197:stop] 0/9992, RunningAvgSamplesPerSec=6.340926020804758, CurrSamplesPerSec=5.750438155929827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:55:00,765] [INFO] [timer.py:197:stop] 0/9994, RunningAvgSamplesPerSec=6.340926852172472, CurrSamplesPerSec=5.695170895359499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:55:12,071] [INFO] [timer.py:197:stop] 0/9996, RunningAvgSamplesPerSec=6.340927324768952, CurrSamplesPerSec=5.710692574242281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:55:23,326] [INFO] [timer.py:197:stop] 0/9998, RunningAvgSamplesPerSec=6.340931445461435, CurrSamplesPerSec=5.722994042135287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 13:55:34,598] [INFO] [logging.py:68:log_dist] [Rank 0] step=5000, skipped=5, lr=[1.3333333333333334e-08], mom=[[0.9, 0.999]] [2022-12-17 13:55:34,599] [INFO] [timer.py:197:stop] 0/10000, RunningAvgSamplesPerSec=6.340933607515229, CurrSamplesPerSec=5.7027141206342025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.3333333333333334e-08, 'epoch': 21.19} {'eval_loss': 0.2120361328125, 'eval_wer': 9.045873924973758, 'eval_runtime': 2098.0347, 'eval_samples_per_second': 3.677, 'eval_steps_per_second': 0.46, 'epoch': 21.19} [2022-12-17 14:30:36,207] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step5000 is begin to save! [2022-12-17 14:30:36,217] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-5000/global_step5000/mp_rank_00_model_states.pt [2022-12-17 14:30:36,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-5000/global_step5000/mp_rank_00_model_states.pt... [2022-12-17 14:30:39,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-5000/global_step5000/mp_rank_00_model_states.pt. [2022-12-17 14:30:39,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-17 14:30:54,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-17 14:30:54,828] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-17 14:30:54,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! [2022-12-17 14:33:04,234] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown [2022-12-17 14:33:04,295] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2022-12-17 14:33:05,834] [WARNING] [cpu_adam.py:83:__init__] FP16 params for CPUAdam may not work on AMD CPUs Installed CUDA version 11.6 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Time to load cpu_adam op: 3.3625645637512207 seconds Adam Optimizer #1 is created with AVX2 arithmetic capability. Config: alpha=0.000010, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2022-12-17 14:33:09,860] [INFO] [logging.py:68:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2022-12-17 14:33:10,163] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2022-12-17 14:33:10,163] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type= [2022-12-17 14:33:10,163] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2022-12-17 14:33:10,163] [INFO] [stage_1_and_2.py:140:__init__] Reduce bucket size 200000000 [2022-12-17 14:33:10,163] [INFO] [stage_1_and_2.py:141:__init__] Allgather bucket size 200000000 [2022-12-17 14:33:10,163] [INFO] [stage_1_and_2.py:142:__init__] CPU Offload: True [2022-12-17 14:33:10,163] [INFO] [stage_1_and_2.py:143:__init__] Round robin gradient partitioning: False Time to load utils op: 0.0004093647003173828 seconds Rank: 0 partition count [1] and sizes[(1543304960, False)] [2022-12-17 14:33:14,433] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states [2022-12-17 14:33:14,434] [INFO] [utils.py:828:see_memory_usage] MA 6.0 GB Max_MA 19.53 GB CA 29.61 GB Max_CA 30 GB [2022-12-17 14:33:14,434] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 48.37 GB, percent = 24.6% [2022-12-17 14:33:20,694] [INFO] [utils.py:827:see_memory_usage] After initializing optimizer states [2022-12-17 14:33:20,694] [INFO] [utils.py:828:see_memory_usage] MA 6.0 GB Max_MA 6.0 GB CA 29.61 GB Max_CA 30 GB [2022-12-17 14:33:20,695] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 68.22 GB, percent = 34.7% [2022-12-17 14:33:20,695] [INFO] [stage_1_and_2.py:525:__init__] optimizer state initialized [2022-12-17 14:33:20,771] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer [2022-12-17 14:33:20,772] [INFO] [utils.py:828:see_memory_usage] MA 6.0 GB Max_MA 6.0 GB CA 29.61 GB Max_CA 30 GB [2022-12-17 14:33:20,772] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 68.22 GB, percent = 34.7% [2022-12-17 14:33:20,801] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2022-12-17 14:33:20,801] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR [2022-12-17 14:33:20,801] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-12-17 14:33:20,801] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-17 14:33:20,803] [INFO] [config.py:1020:print] DeepSpeedEngine configuration: [2022-12-17 14:33:20,803] [INFO] [config.py:1024:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-12-17 14:33:20,803] [INFO] [config.py:1024:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-12-17 14:33:20,803] [INFO] [config.py:1024:print] amp_enabled .................. False [2022-12-17 14:33:20,803] [INFO] [config.py:1024:print] amp_params ................... False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] bfloat16_enabled ............. False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] checkpoint_parallel_write_pipeline False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] checkpoint_tag_validation_enabled True [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] checkpoint_tag_validation_fail False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] comms_config ................. [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] communication_data_type ...... None [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] curriculum_enabled ........... False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] curriculum_params ............ False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] dataloader_drop_last ......... False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] disable_allgather ............ False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] dump_state ................... False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] eigenvalue_enabled ........... False [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] eigenvalue_gas_boundary_resolution 1 [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-12-17 14:33:20,804] [INFO] [config.py:1024:print] eigenvalue_layer_num ......... 0 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] eigenvalue_max_iter .......... 100 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] eigenvalue_stability ......... 1e-06 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] eigenvalue_tol ............... 0.01 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] eigenvalue_verbose ........... False [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] elasticity_enabled ........... False [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] fp16_auto_cast ............... False [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] fp16_enabled ................. True [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] fp16_master_weights_and_gradients False [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] global_rank .................. 0 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] grad_accum_dtype ............. None [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] gradient_accumulation_steps .. 2 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] gradient_clipping ............ 1.0 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] gradient_predivide_factor .... 1.0 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] initial_dynamic_scale ........ 65536 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] load_universal_checkpoint .... False [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] loss_scale ................... 0 [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] memory_breakdown ............. False [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] monitor_config ............... [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] optimizer_legacy_fusion ...... False [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] optimizer_name ............... adamw [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] optimizer_params ............. {'lr': 1e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0} [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-12-17 14:33:20,805] [INFO] [config.py:1024:print] pld_enabled .................. False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] pld_params ................... False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] prescale_gradients ........... False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] scheduler_name ............... WarmupDecayLR [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] scheduler_params ............. {'last_batch_iteration': -1, 'total_num_steps': 5000, 'warmup_min_lr': 0, 'warmup_max_lr': 1e-05, 'warmup_num_steps': 500} [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] sparse_attention ............. None [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] sparse_gradients_enabled ..... False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] steps_per_print .............. 10 [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] train_batch_size ............. 64 [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] train_micro_batch_size_per_gpu 32 [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] use_node_local_storage ....... False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] wall_clock_breakdown ......... False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] world_size ................... 1 [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] zero_allow_untested_optimizer False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-17 14:33:20,806] [INFO] [config.py:1024:print] zero_optimization_stage ...... 2 [2022-12-17 14:33:20,806] [INFO] [config.py:1009:print_user_config] json = { "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": 1e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.0 } }, "scheduler": { "type": "WarmupDecayLR", "params": { "last_batch_iteration": -1, "total_num_steps": 5.000000e+03, "warmup_min_lr": 0, "warmup_max_lr": 1e-05, "warmup_num_steps": 500 } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2.000000e+08, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 2, "gradient_clipping": 1.0, "train_batch_size": 64, "train_micro_batch_size_per_gpu": 32 } Time to load utils op: 0.00031685829162597656 seconds [2022-12-17 14:33:20,807] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ./checkpoint-5000/global_step5000/mp_rank_00_model_states.pt... [2022-12-17 14:33:21,825] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ./checkpoint-5000/global_step5000/mp_rank_00_model_states.pt. [2022-12-17 14:33:21,950] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ./checkpoint-5000/global_step5000/mp_rank_00_model_states.pt... [2022-12-17 14:33:23,082] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ./checkpoint-5000/global_step5000/mp_rank_00_model_states.pt. [2022-12-17 14:33:23,454] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ./checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-17 14:33:29,392] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ./checkpoint-5000/global_step5000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-17 14:33:29,392] [INFO] [engine.py:2900:_get_all_zero_checkpoint_state_dicts] successfully read 1 ZeRO state_dicts for rank 0 [2022-12-17 14:33:32,457] [INFO] [engine.py:2840:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 {'train_runtime': 67979.462, 'train_samples_per_second': 4.707, 'train_steps_per_second': 0.074, 'train_loss': 0.015194951736927033, 'epoch': 21.19} 12/17/2022 14:35:04 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream. 12/17/2022 14:35:04 - WARNING - huggingface_hub.repository - The progress bars may be unreliable. 12/17/2022 14:36:20 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow... remote: LFS file scan complete. To https://huggingface.co/mikr/whisper-large2-czech-cv11-v2 409b7ab..826f494 main -> main 12/17/2022 14:37:37 - WARNING - huggingface_hub.repository - To https://huggingface.co/mikr/whisper-large2-czech-cv11-v2 826f494..9271259 main -> main ***** train metrics ***** epoch = 21.19 train_loss = 0.0152 train_runtime = 18:52:59.46 train_samples_per_second = 4.707 train_steps_per_second = 0.074 12/17/2022 14:37:41 - INFO - __main__ - *** Evaluate *** ***** eval metrics ***** epoch = 21.19 eval_loss = 0.212 eval_runtime = 0:35:04.81 eval_samples_per_second = 3.665 eval_steps_per_second = 0.458 eval_wer = 9.0459 12/17/2022 15:14:25 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow... remote: LFS file scan complete. To https://huggingface.co/mikr/whisper-large2-czech-cv11-v2 9271259..d5ff598 main -> main